I am not sure which version is was added, you'd have to check CHANGES.txt, but upgrading is usually a good idea and very simple. Markus
-----Original message----- > From:Arora, Madhvi <[email protected]> > Sent: Friday 5th August 2016 19:53 > To: [email protected] > Subject: Re: Protocol change to https > > Markus so to crawl https and http urls successfully we just need to switch to > a newer version of Nutch I.e. Higher than Nutch 1.10? > > > > On 8/5/16, 12:47 PM, "Markus Jelsma" <[email protected]> wrote: > > >Hello - see inline. > >Markus > > > >-----Original message----- > >> From:Arora, Madhvi <[email protected]> > >> Sent: Friday 5th August 2016 18:03 > >> To: [email protected] > >> Subject: Protocol change to https > >> > >> Hi, > >> > >> We are using Nutch 1.10 and Solr 5. We have around 10 different web sites > >> that are crawled regularly. We are changing protocol of a few websites > >> from http to https. So we will have a mix bag of http and https protocols. > >> I checked in nutch user-mail archive and get that we need to change > >> protocol-http to protocol-httpclient. > >> 1: I wanted to find out the best way to handle this > > > >You can still use protocol-http, in some recent version we added TLS support > >to it. > > > >> 2: What are the issues with using protocol-httpclient i.e. there were > >> previous references to issues with use of protocol-httpclient. > > > >It does not allow unencoded URL's, but in recent Nutch' we improved basic > >normalizer to fix it for you. > > > >> 3: Steps that need to be taken to update the SOLR index. I think that I > >> will need to delete the old http urls from solr index, re-crawl and index > >> the urls that need to be switched to https. > > > >Yes, just delete and recrawl and reindex everything. And consider upgrading > >to 1.12. > > > >> > >> I will be grateful for any guidance or suggestions. > >> > >> Thanks, > >> Madhvi > >> > >> >

