Hi Patrick, Thanks for your help. I'll dig around a bit more, try the proxy thing, maybe try the database approach, and see how it goes. Much appreciated,
Yoav On Wed, Oct 1, 2008 at 1:14 PM, Patrick Markiewicz <[EMAIL PROTECTED]> wrote: > Hi Yoav, > If the content is dynamic, presumably it is stored in a > database? I was just thinking that it might be easier to use some > database utilities to index the information. > > Do you know how to use JMeter to record the requests that a web > browser makes? The browser uses a particular port as a proxy. I know > that the JMeter cookie manager can save the cookies that are gathered as > part of the request. > I'm pretty sure that nutch can use a proxy. > http://wiki.apache.org/nutch/SetupProxyForNutch > > According to this page here: > http://jakarta.apache.org/jmeter/usermanual/component_reference.html#HTT > P_Cookie_Manager > you can manually add a cookie that will be used by all threads. I am > guessing that if you set up JMeter to act as a proxy, that this thread > would be included as one of those that contains the cookie. > > If the proxy thread can not have cookies added manually, then this > strategy wouldn't work. > > Patrick > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of > Yoav Shapira > Sent: Wednesday, October 01, 2008 11:47 AM > To: [email protected] > Subject: Re: How do I crawl a site with a cookie for authentication? > > Patrick, > Thank you for the answers. More below: > > 2008/10/1 Patrick Markiewicz <[EMAIL PROTECTED]>: >> Is it possible for you to retrieve a resource by using the url: >> http://username:[EMAIL PROTECTED]/path/to/resource.htm > > The system does not support HTTP Basic authentication at this time, > unfortunately. > >> I'm not sure what level of authority you have with the intranet site. > You could do a similar >trick by crawling the local filesystem of that > site, and then just having the search page edit > > The site is dynamically generated. There are no meaningful static > files on the file system. > >> If you only have your own account, and can't change any other things, > then you might be >able to use JMeter to add a cookie and have nutch use > JMeter as a proxy. I have never > > This is very intriguing. How would I get started on this? I've used > JMeter in the past for simple test plans, but never as an HTTP proxy. > > Yoav > -- Thanks, Yoav
