Thanks Vimal.

I switched plugin to protocol-httpclient
set http.useHttp11 to true
and updated

commons-httpclient-3.0.1.jar

to

commons-httpclient-3.1.jar

and it seems fine now.

--- On Wed, 1/21/09, Vimal Varghese <[email protected]> wrote:

> From: Vimal Varghese <[email protected]>
> Subject: Re: AW: fetching https documents
> To: [email protected]
> Cc: "[email protected]" <[email protected]>, 
> "[email protected]" <[email protected]>
> Date: Wednesday, January 21, 2009, 10:45 PM
> Hi Alex,
> 
> If its not fetching https . you can try adding this https
> line to your 
> crawl-urlfilter.txt file 
> 
> # accept hosts in MY.DOMAIN.NAME
> +^http://([a-z0-9]*\.)*(DOMAIN1|DOMAIN2)/
> +^https://([a-z0-9]*\.)*(DOMAIN1|DOMAIN2)/
> 
> after adding this line it will fetch all the https urls. 
> 
> But i am still getting this exceptions for the https urls
> 
> javax.net.ssl.SSLException: Unrecognized SSL message,
> plaintext 
> connection?
> 
>  org.apache.nutch.protocol.http.api.HttpException: 
> java.net.UnknownHostException: secure.americanexpress.com
> 
> Vimal Varghese
> 
> 
> 
> 
> Koch Martina <[email protected]> 
> 21-01-09 04:05 PM
> Please respond to
> [email protected]
> 
> 
> To
> "[email protected]"
> <[email protected]>, 
> "[email protected]" <[email protected]>
> cc
> 
> Subject
> AW: fetching https documents
> 
> 
> 
> 
> 
> 
> Hi Alex,
> 
> https pages can be fetched with the protocol-httpclient
> plugin.
> 
> Kind regards,
> Martina
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Alex Basa [mailto:[email protected]] 
> Gesendet: Mittwoch, 21. Januar 2009 00:41
> An: [email protected]
> Betreff: fetching https documents
> 
> I searched for patches and couldn't find one.  Does
> anyone know if nutch 
> 0.9 supports crawling https websites?  If so, can someone
> point me to the 
> patch?
> 
> Thanks in advance,
> 
> Alex
> 
> 
>  
> 
> ForwardSourceID:NT0001429A 
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you



Reply via email to