Re: crawling sites which require authentication

2006-10-29 Thread Toufeeq Hussain
Hi Tomi, On 10/22/06, Tomi NA <[EMAIL PROTECTED]> wrote: Toufeeq, could you say anything more on the topic of nutch in-built NTLM authentication support? My work has been limited to 0.7.X version of nutch. Below are some of my findings.. The file src/plugin/protocol-httpclient/src/java/org/

Re: crawling sites which require authentication

2006-10-29 Thread Toufeeq Hussain
Oops.. Sorry about the mail below. Did not know reply-to munging was being done. :) -Toufeeq On 10/30/06, Toufeeq Hussain <[EMAIL PROTECTED]> wrote: dude.. You got Nutch working with NTLM ? -Toufeeq On 10/12/06, Guruprasad Iyer <[EMAIL PROTECTED]> wrote: > Hi, > > I need to know how to craw

Re: crawling sites which require authentication

2006-10-29 Thread Toufeeq Hussain
dude.. You got Nutch working with NTLM ? -Toufeeq On 10/12/06, Guruprasad Iyer <[EMAIL PROTECTED]> wrote: Hi, I need to know how to crawl (intranet) sites which require authentication. One suggestion was that I replace protocol-http with protocol-httpclient in the value field of plugin.includ

Re: crawling sites which require authentication

2006-10-23 Thread Tomi NA
2006/10/14, Tomi NA <[EMAIL PROTECTED]>: 2006/10/14, Toufeeq Hussain <[EMAIL PROTECTED]>: > From internal tests with ntlmaps + Nutch the conclusion we came to was > that though it "kinda-works" it puts a huge load on the Nutch server > as ntlmaps is a major memory-hog and the mixture of the two

Re: crawling sites which require authentication

2006-10-14 Thread Jim Wilson
Yeah seriously - if NTLM auth (or HTTP Basic for that matter) is supported natively by Nutch, I'd love to read the documentation on it! -- Jim On 10/14/06, Tomi NA <[EMAIL PROTECTED]> wrote: 2006/10/14, Toufeeq Hussain <[EMAIL PROTECTED]>: > From internal tests with ntlmaps + Nutch the conclu

Re: crawling sites which require authentication

2006-10-14 Thread Tomi NA
2006/10/14, Toufeeq Hussain <[EMAIL PROTECTED]>: From internal tests with ntlmaps + Nutch the conclusion we came to was that though it "kinda-works" it puts a huge load on the Nutch server as ntlmaps is a major memory-hog and the mixture of the two leads to performance issues. For a PoC this wil

Re: crawling sites which require authentication

2006-10-13 Thread Toufeeq Hussain
Hi Tomi, On 10/13/06, Tomi NA <[EMAIL PROTECTED]> wrote: Guruprasad, please use "reply-all" so your messages end up on the list as well. As far as ntlmaps is concerned, you can read about it here http://ntlmaps.sourceforge.net/ od download it here http://sourceforge.net/project/showfiles.php?gro

Re: crawling sites which require authentication

2006-10-13 Thread Tomi NA
2006/10/13, Guruprasad Iyer <[EMAIL PROTECTED]>: Hi Tomi, "using a ntlmaps proxy" How do I get this proxy? "You tell nutch to use the proxy and you provide the proxy with adequate access priviledges." How do I do this? Can you elaborate? I am a new Nutch user and am very much in the learning p

Re: crawling sites which require authentication

2006-10-12 Thread Ravi Chintakunta
Switching from protocol-http to protocol-httpclient will help in crawling secured sites (https). If your site supports HTTP Basic authentication, then you can modify the HTTP class in the protocol-httpclient plugin. These are minor changes in the configureClient method: client.getParams().setAu

Re: crawling sites which require authentication

2006-10-12 Thread Tomi NA
2006/10/12, Guruprasad Iyer <[EMAIL PROTECTED]>: Hi, I need to know how to crawl (intranet) sites which require authentication. One suggestion was that I replace protocol-http with protocol-httpclient in the value field of plugin.includes tag in the nutch-default.xml file. However, this did not

Re: crawling sites which require authentication

2006-10-12 Thread Jim Wilson
Standard community response: it's not built in, but you could write an extension! (I asked this myself a few months back). -- Jim On 10/12/06, Guruprasad Iyer <[EMAIL PROTECTED]> wrote: Hi, I need to know how to crawl (intranet) sites which require authentication. One suggestion was that I r

crawling sites which require authentication

2006-10-12 Thread Guruprasad Iyer
Hi, I need to know how to crawl (intranet) sites which require authentication. One suggestion was that I replace protocol-http with protocol-httpclient in the value field of plugin.includes tag in the nutch-default.xml file. However, this did not solve the problem. Can you help me out on this? Th