Hey Roannel,

This is what I needed. I kept trying to Google things related to certificates 
and nutch but I guess I should have just said java and certificates instead. 
Works like a charm now.


Thanks

Sid

-----Original Message-----
From: Roannel Fernández Hernández [mailto:[email protected]] 
Sent: November-28-17 3:31 PM
To: [email protected]
Subject: Re: [MASSMAIL]Certificates

Hi Sadiki:

You must add your Solr's certificate into cacerts (keystore by default) of your 
Java distribution. Under Linux you can know where your cacerts file is, with:

echo $(readlink -f /usr/bin/java | sed "s:bin/java::")lib/security/cacerts

as is described on 
https://stackoverflow.com/questions/11936685/how-to-obtain-the-location-of-cacerts-of-the-default-java-installation

Regards.

----- Mensaje original -----
> De: "Sadiki Latty" <[email protected]>
> Para: [email protected]
> Enviados: Martes, 28 de Noviembre 2017 14:03:27
> Asunto: RE: [MASSMAIL]Certificates
> 
> Hey Eyeris,
> 
> Thanks for the response. My issue isn't with the http/https crawling 
> but rather the indexing to Solr. My Solr instances are self-signed and 
> when Nutch tries to index what It found it fails because it doesn’t 
> respect the cert that Solr made. I had the same issue with Solr 
> talking to other Solr instances and the solution was to manually add 
> the cert and point Solr to the keystore file. I was hoping I could 
> find a similar solution for Nutch where I could add the Solr cert to the 
> Nutch keystore but.
>       1. I don’t know if Nutch can do that?
>       2. If Nutch has this feature I don't know where the keystore file is.
>       3. Your suggestion of using Portecle may be suitable for what I need 
> but I
>       still need to know where Nutch keeps this keystore file AND/OR how to 
> tell
>       Nutch to use this keystore file.
> 
> I am also willing to use protocol-httpclient but it still (without 
> extra
> configuration) doesn’t work me. I'm fairly new to Nutch so forgive me 
> if I'm missing something obvious.
> 
> Thanks
> 
> Sid
>       
> 
> -----Original Message-----
> From: Eyeris Rodriguez Rueda [mailto:[email protected]]
> Sent: November-28-17 12:07 PM
> To: [email protected]
> Subject: Re: [MASSMAIL]Certificates
> 
> Hello Sid.
> I am using protocol-httpclient because in my modest opinion it have a 
> better handling of https websites than protocol-http.
> Since java 1.7 my problems with self signed certificates was deleted 
> and using protocol-httpclient and nutch 1.12.
> But if you have problems with websites that have self signed 
> certificates maybe you need to insert certificates into java keystore 
> using portecle tool you can download here: 
> https://sourceforge.net/projects/portecle/
> 
> Best regards.
> 
> 
> 
> ----- Mensaje original -----
> De: "Sadiki Latty" <[email protected]>
> Para: [email protected]
> Enviados: Martes, 28 de Noviembre 2017 11:08:28
> Asunto: [MASSMAIL]Certificates
> 
> Hey all,
> 
> I have a question regarding self-signed certs. I will be using nutch 
> to crawl http and https sites, as well as using it to index to 
> self-signed https Solr servers. I managed to add certificates to Solr 
> and it fixed their inter-node communication butI am yet to find where 
> in nutch I can do a similar configuration. I have seen articles saying 
> that the protocol-httpclient plugin should be able to do it with some 
> code modifications but the caveat is that httpclient may have underlying bugs 
> so protocol-http is recommended.
> These articles were also almost 3 years old so options may have evolved now.
> Can some someone provide some insight into what my next steps should be.
> Essentially here are my questions:
> 
> 1.       Should I use protocol-http, protocol-httpclient or other?
> 
> 
> 
> 2.       Is there somewhere in a config file that I can tell Nutch to use a
> java keystore file similar to Solr?
> 
> Thanks
> 
> Sid
> 
> **********************
> Text below is autogenerated by my email suplier.
> La @universidad_uci es Fidel: 15 años conectados al futuro... 
> conectados a la Revolución
> 2002-2017
> 
La @universidad_uci es Fidel: 15 años conectados al futuro... conectados a la 
Revolución
2002-2017

Reply via email to