Selva,

I don't believe that Nutch, as yet, has any capability to deal with
HTTP authentication at all. Nor cookies either, which many
authenticated sites require.

If you can find an HTTP proxy that will handle authentication w/o a
browser's intervention, you might want to try running Nutch's crawler
through it. Set the properties "http.proxy.host" and "http.proxy.port"
in nutch-site.xml.

Failing that, the code you'd need to modify should all be right here:
http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/protocol-http/src/java/net/nutch/protocol/http/

Hope that helps,
--Matt

On Thu, 25 Nov 2004 11:57:21 +0530, selvakumar
<[EMAIL PROTECTED]> wrote:
> 
> Hi All,
> 
> I am configuring the Nutch Search engine. I have successfully configured
> it on tomcat 5. I did crawling some internal and internet sites. Search
> works fine.
> Currently I am looking for crawling the authentication required sites.
> Can you please suggest me how to configure the authentication
> information and crawling the authentication required sites.
> 
> Thanks and Regards
> Selva
> 
> This e-mail and any files transmitted with it are for the sole use of the 
> intended recipient(s) and may contain confidential and privileged information.
> If you are not the intended recipient, please contact the sender by reply 
> e-mail and destroy all copies of the original message.
> Any unauthorised review, use, disclosure, dissemination, forwarding, printing 
> or copying of this email or any action taken in reliance on this e-mail is 
> strictly
> prohibited and may be unlawful.
> 
>   Visit us at http://www.cognizant.com
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> Nutch-developers mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to