You can enable protocol-httpclient plugin to crawl HTTPS pages. To enable this you need to override the 'plugin.includes' property in conf/nutch-site.xml. To read the details of 'plugin.includes' search this property in conf/nutch-default.xml and read its description. Copy paste the property into conf/nutch-site.xml and replace protocol-http with protocol-httpclient.
Regards, Susam Pal On Thu, Nov 20, 2008 at 12:23 PM, Vimal Varghese <[EMAIL PROTECTED]> wrote: > > How to crawl https URL's? I am not getting information from anywhere. > > Vimal Varghese > Tata Consultancy Services > TEJOMAYA, L & T TECH PARK LIMITED > INFOPARK, KUSUMAGIRI POST, KAKKANAD, > Kochi - 682030,. > India > Ph:- +91 484 6618791 > Cell:- 9446557234 > Mailto: [EMAIL PROTECTED] > Website: http://www.tcs.com > ____________________________________________ > Experience certainty. IT Services > Business Solutions > Outsourcing > ____________________________________________ > > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > >
