Hi Patrick,

Thanks for your help.  I'll dig around a bit more, try the proxy
thing, maybe try the database approach, and see how it goes.  Much
appreciated,

Yoav

On Wed, Oct 1, 2008 at 1:14 PM, Patrick Markiewicz
<[EMAIL PROTECTED]> wrote:
> Hi Yoav,
>        If the content is dynamic, presumably it is stored in a
> database?  I was just thinking that it might be easier to use some
> database utilities to index the information.
>
>        Do you know how to use JMeter to record the requests that a web
> browser makes?  The browser uses a particular port as a proxy.  I know
> that the JMeter cookie manager can save the cookies that are gathered as
> part of the request.
>        I'm pretty sure that nutch can use a proxy.
> http://wiki.apache.org/nutch/SetupProxyForNutch
>
> According to this page here:
> http://jakarta.apache.org/jmeter/usermanual/component_reference.html#HTT
> P_Cookie_Manager
> you can manually add a cookie that will be used by all threads.  I am
> guessing that if you set up JMeter to act as a proxy, that this thread
> would be included as one of those that contains the cookie.
>
> If the proxy thread can not have cookies added manually, then this
> strategy wouldn't work.
>
> Patrick
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
> Yoav Shapira
> Sent: Wednesday, October 01, 2008 11:47 AM
> To: [email protected]
> Subject: Re: How do I crawl a site with a cookie for authentication?
>
> Patrick,
> Thank you for the answers.  More below:
>
> 2008/10/1 Patrick Markiewicz <[EMAIL PROTECTED]>:
>> Is it possible for you to retrieve a resource by using the url:
>> http://username:[EMAIL PROTECTED]/path/to/resource.htm
>
> The system does not support HTTP Basic authentication at this time,
> unfortunately.
>
>> I'm not sure what level of authority you have with the intranet site.
> You could do a similar >trick by crawling the local filesystem of that
> site, and then just having the search page edit
>
> The site is dynamically generated.  There are no meaningful static
> files on the file system.
>
>> If you only have your own account, and can't change any other things,
> then you might be >able to use JMeter to add a cookie and have nutch use
> JMeter as a proxy.  I have never
>
> This is very intriguing.  How would I get started on this?  I've used
> JMeter in the past for simple test plans, but never as an HTTP proxy.
>
> Yoav
>



-- 
Thanks,

Yoav

Reply via email to