I am not sure which version is was added, you'd have to check CHANGES.txt, but 
upgrading is usually a good idea and very simple.
Markus

 
 
-----Original message-----
> From:Arora, Madhvi <[email protected]>
> Sent: Friday 5th August 2016 19:53
> To: [email protected]
> Subject: Re: Protocol change to https
> 
> Markus so to crawl https and http urls successfully we just need to switch to 
> a newer version of Nutch I.e. Higher than Nutch 1.10? 
> 
> 
> 
> On 8/5/16, 12:47 PM, "Markus Jelsma" <[email protected]> wrote:
> 
> >Hello - see inline.
> >Markus 
> > 
> >-----Original message-----
> >> From:Arora, Madhvi <[email protected]>
> >> Sent: Friday 5th August 2016 18:03
> >> To: [email protected]
> >> Subject: Protocol change to https
> >> 
> >> Hi,
> >> 
> >> We are using Nutch 1.10 and Solr 5. We have around 10 different web sites 
> >> that are crawled regularly. We are changing  protocol of a few websites 
> >> from http to https. So we will have a mix bag of http and https protocols.
> >> I checked in nutch user-mail archive and get that we need to change 
> >> protocol-http to protocol-httpclient.
> >> 1: I wanted to find out the best way to handle this
> >
> >You can still use protocol-http, in some recent version we added TLS support 
> >to it.
> >
> >> 2: What are the issues with using protocol-httpclient i.e. there were 
> >> previous references to issues with use of protocol-httpclient.
> >
> >It does not allow unencoded URL's, but in recent Nutch' we improved basic 
> >normalizer to fix it for you.
> >
> >> 3: Steps that need to be taken to update the SOLR index. I think that I 
> >> will need to delete the old http urls from solr index, re-crawl and index  
> >> the urls that need to be switched to https.
> >
> >Yes, just delete and recrawl and reindex everything. And consider upgrading 
> >to 1.12.
> >
> >> 
> >> I will be grateful for any guidance or suggestions.
> >> 
> >> Thanks,
> >> Madhvi
> >> 
> >> 
> 

Reply via email to