Hey Eyeris,

Thanks for the response. My issue isn't with the http/https crawling but rather 
the indexing to Solr. My Solr instances are self-signed and when Nutch tries to 
index what It found it fails because it doesn’t respect the cert that Solr 
made. I had the same issue with Solr talking to other Solr instances and the 
solution was to manually add the cert and point Solr to the keystore file. I 
was hoping I could find a similar solution for Nutch where I could add the Solr 
cert to the Nutch keystore but. 
        1. I don’t know if Nutch can do that?
        2. If Nutch has this feature I don't know where the keystore file is.
        3. Your suggestion of using Portecle may be suitable for what I need 
but I still need to know where Nutch keeps this keystore file AND/OR how to 
tell Nutch to use this keystore file.

I am also willing to use protocol-httpclient but it still (without extra 
configuration) doesn’t work me. I'm fairly new to Nutch so forgive me if I'm 
missing something obvious.

Thanks

Sid
        

-----Original Message-----
From: Eyeris Rodriguez Rueda [mailto:[email protected]] 
Sent: November-28-17 12:07 PM
To: [email protected]
Subject: Re: [MASSMAIL]Certificates

Hello Sid.
I am using protocol-httpclient because in my modest opinion it have a better 
handling of https websites than protocol-http.
Since java 1.7 my problems with self signed certificates was deleted and using 
protocol-httpclient and nutch 1.12.
But if you have problems with websites that have self signed certificates maybe 
you need to insert certificates into java keystore using portecle tool you can 
download here: https://sourceforge.net/projects/portecle/

Best regards.



----- Mensaje original -----
De: "Sadiki Latty" <[email protected]>
Para: [email protected]
Enviados: Martes, 28 de Noviembre 2017 11:08:28
Asunto: [MASSMAIL]Certificates

Hey all,

I have a question regarding self-signed certs. I will be using nutch to crawl 
http and https sites, as well as using it to index to self-signed https Solr 
servers. I managed to add certificates to Solr and it fixed their inter-node 
communication butI am yet to find where in nutch I can do a similar 
configuration. I have seen articles saying that the protocol-httpclient plugin 
should be able to do it with some code modifications but the caveat is that 
httpclient may have underlying bugs so protocol-http is recommended. These 
articles were also almost 3 years old so options may have evolved now. Can some 
someone provide some insight into what my next steps should be. Essentially 
here are my questions:

1.       Should I use protocol-http, protocol-httpclient or other?



2.       Is there somewhere in a config file that I can tell Nutch to use a 
java keystore file similar to Solr?

Thanks

Sid

**********************
Text below is autogenerated by my email suplier.
La @universidad_uci es Fidel: 15 años conectados al futuro... conectados a la 
Revolución
2002-2017

Reply via email to