Hello - see inline. Markus -----Original message----- > From:Arora, Madhvi <[email protected]> > Sent: Friday 5th August 2016 18:03 > To: [email protected] > Subject: Protocol change to https > > Hi, > > We are using Nutch 1.10 and Solr 5. We have around 10 different web sites > that are crawled regularly. We are changing protocol of a few websites from > http to https. So we will have a mix bag of http and https protocols. > I checked in nutch user-mail archive and get that we need to change > protocol-http to protocol-httpclient. > 1: I wanted to find out the best way to handle this
You can still use protocol-http, in some recent version we added TLS support to it. > 2: What are the issues with using protocol-httpclient i.e. there were > previous references to issues with use of protocol-httpclient. It does not allow unencoded URL's, but in recent Nutch' we improved basic normalizer to fix it for you. > 3: Steps that need to be taken to update the SOLR index. I think that I will > need to delete the old http urls from solr index, re-crawl and index the > urls that need to be switched to https. Yes, just delete and recrawl and reindex everything. And consider upgrading to 1.12. > > I will be grateful for any guidance or suggestions. > > Thanks, > Madhvi > >

