Thanks Vimal. I switched plugin to protocol-httpclient set http.useHttp11 to true and updated
commons-httpclient-3.0.1.jar to commons-httpclient-3.1.jar and it seems fine now. --- On Wed, 1/21/09, Vimal Varghese <[email protected]> wrote: > From: Vimal Varghese <[email protected]> > Subject: Re: AW: fetching https documents > To: [email protected] > Cc: "[email protected]" <[email protected]>, > "[email protected]" <[email protected]> > Date: Wednesday, January 21, 2009, 10:45 PM > Hi Alex, > > If its not fetching https . you can try adding this https > line to your > crawl-urlfilter.txt file > > # accept hosts in MY.DOMAIN.NAME > +^http://([a-z0-9]*\.)*(DOMAIN1|DOMAIN2)/ > +^https://([a-z0-9]*\.)*(DOMAIN1|DOMAIN2)/ > > after adding this line it will fetch all the https urls. > > But i am still getting this exceptions for the https urls > > javax.net.ssl.SSLException: Unrecognized SSL message, > plaintext > connection? > > org.apache.nutch.protocol.http.api.HttpException: > java.net.UnknownHostException: secure.americanexpress.com > > Vimal Varghese > > > > > Koch Martina <[email protected]> > 21-01-09 04:05 PM > Please respond to > [email protected] > > > To > "[email protected]" > <[email protected]>, > "[email protected]" <[email protected]> > cc > > Subject > AW: fetching https documents > > > > > > > Hi Alex, > > https pages can be fetched with the protocol-httpclient > plugin. > > Kind regards, > Martina > > > -----Ursprüngliche Nachricht----- > Von: Alex Basa [mailto:[email protected]] > Gesendet: Mittwoch, 21. Januar 2009 00:41 > An: [email protected] > Betreff: fetching https documents > > I searched for patches and couldn't find one. Does > anyone know if nutch > 0.9 supports crawling https websites? If so, can someone > point me to the > patch? > > Thanks in advance, > > Alex > > > > > ForwardSourceID:NT0001429A > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you
