You can enable protocol-httpclient plugin to crawl HTTPS pages. To
enable this you need to override the 'plugin.includes' property in
conf/nutch-site.xml. To read the details of 'plugin.includes' search
this property in conf/nutch-default.xml and read its description. Copy
paste the property into conf/nutch-site.xml and replace protocol-http
with protocol-httpclient.

Regards,
Susam Pal

On Thu, Nov 20, 2008 at 12:23 PM, Vimal Varghese <[EMAIL PROTECTED]> wrote:
>
> How to crawl https URL's?  I am not getting information from anywhere.
>
> Vimal Varghese
> Tata Consultancy Services
> TEJOMAYA, L & T TECH PARK LIMITED
> INFOPARK, KUSUMAGIRI POST, KAKKANAD,
> Kochi - 682030,.
> India
> Ph:- +91 484 6618791
> Cell:- 9446557234
> Mailto: [EMAIL PROTECTED]
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty.        IT Services
>                        Business Solutions
>                        Outsourcing
> ____________________________________________
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>

Reply via email to