Hey Eyeris,
Thanks for the response. My issue isn't with the http/https crawling but rather
the indexing to Solr. My Solr instances are self-signed and when Nutch tries to
index what It found it fails because it doesn’t respect the cert that Solr
made. I had the same issue with Solr talking to other Solr instances and the
solution was to manually add the cert and point Solr to the keystore file. I
was hoping I could find a similar solution for Nutch where I could add the Solr
cert to the Nutch keystore but.
1. I don’t know if Nutch can do that?
2. If Nutch has this feature I don't know where the keystore file is.
3. Your suggestion of using Portecle may be suitable for what I need
but I still need to know where Nutch keeps this keystore file AND/OR how to
tell Nutch to use this keystore file.
I am also willing to use protocol-httpclient but it still (without extra
configuration) doesn’t work me. I'm fairly new to Nutch so forgive me if I'm
missing something obvious.
Thanks
Sid
-----Original Message-----
From: Eyeris Rodriguez Rueda [mailto:[email protected]]
Sent: November-28-17 12:07 PM
To: [email protected]
Subject: Re: [MASSMAIL]Certificates
Hello Sid.
I am using protocol-httpclient because in my modest opinion it have a better
handling of https websites than protocol-http.
Since java 1.7 my problems with self signed certificates was deleted and using
protocol-httpclient and nutch 1.12.
But if you have problems with websites that have self signed certificates maybe
you need to insert certificates into java keystore using portecle tool you can
download here: https://sourceforge.net/projects/portecle/
Best regards.
----- Mensaje original -----
De: "Sadiki Latty" <[email protected]>
Para: [email protected]
Enviados: Martes, 28 de Noviembre 2017 11:08:28
Asunto: [MASSMAIL]Certificates
Hey all,
I have a question regarding self-signed certs. I will be using nutch to crawl
http and https sites, as well as using it to index to self-signed https Solr
servers. I managed to add certificates to Solr and it fixed their inter-node
communication butI am yet to find where in nutch I can do a similar
configuration. I have seen articles saying that the protocol-httpclient plugin
should be able to do it with some code modifications but the caveat is that
httpclient may have underlying bugs so protocol-http is recommended. These
articles were also almost 3 years old so options may have evolved now. Can some
someone provide some insight into what my next steps should be. Essentially
here are my questions:
1. Should I use protocol-http, protocol-httpclient or other?
2. Is there somewhere in a config file that I can tell Nutch to use a
java keystore file similar to Solr?
Thanks
Sid
**********************
Text below is autogenerated by my email suplier.
La @universidad_uci es Fidel: 15 años conectados al futuro... conectados a la
Revolución
2002-2017