Hello - see inline.
Markus 
 
-----Original message-----
> From:Arora, Madhvi <[email protected]>
> Sent: Friday 5th August 2016 18:03
> To: [email protected]
> Subject: Protocol change to https
> 
> Hi,
> 
> We are using Nutch 1.10 and Solr 5. We have around 10 different web sites 
> that are crawled regularly. We are changing  protocol of a few websites from 
> http to https. So we will have a mix bag of http and https protocols.
> I checked in nutch user-mail archive and get that we need to change 
> protocol-http to protocol-httpclient.
> 1: I wanted to find out the best way to handle this

You can still use protocol-http, in some recent version we added TLS support to 
it.

> 2: What are the issues with using protocol-httpclient i.e. there were 
> previous references to issues with use of protocol-httpclient.

It does not allow unencoded URL's, but in recent Nutch' we improved basic 
normalizer to fix it for you.

> 3: Steps that need to be taken to update the SOLR index. I think that I will 
> need to delete the old http urls from solr index, re-crawl and index  the 
> urls that need to be switched to https.

Yes, just delete and recrawl and reindex everything. And consider upgrading to 
1.12.

> 
> I will be grateful for any guidance or suggestions.
> 
> Thanks,
> Madhvi
> 
> 

Reply via email to