Crawling password protected sites would require two things:

1. being able to submit data to auth page via post, as most do not
accept the login in the query string, some do, but most dont.
2. being able to manage the session during the crawl, so that the server
thinks the agent is stilled logged in as it goes from page to page.  I
did this in an intelligent agent I wrote about 6 years ago, but I don't
know enough about the nutch agent to tell if it is possible.

-----Original Message-----
From: Mohini Padhye [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 02, 2006 4:26 PM
To: [email protected]
Subject: RE: https plugin for Nutch


Sameer,
Thanks for the reply. I could configure and use protocol-http plugin for
crawling site that's using https protocol. Also, has anyone worked with
crawling password protected sites? My requirement is crawling an
intranet site that uses https and user authentication. I searched
through the forum but couldn't find anybody who has successfully
implemented it. I'm also going through the source files for
protocol-http plugin to see if any changes can be made there for my
specific requirement. Thanks, Mohini


-----Original Message-----
From: Sameer Tamsekar [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 01, 2006 10:31 PM
To: [email protected]
Subject: Re: https plugin for Nutch

If you use protocol-httpclient (versus protocol-http) then it should
support https.

I have got this reply from one of the mailing list user.

Regards,

Sameer

On 3/2/06, Mohini Padhye <[EMAIL PROTECTED]> wrote:
>
> I am using nutch-0.7.1. I wanted to know if anyone has successfully
> implemented https plugin for nutch.
> If not, can someone provide guidelines about developing it and I can 
> start with the implementation?
> -Mohini
>
>



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to