Re: questions about nutch 1.9

Eyeris RodrIguez Rueda Tue, 16 Dec 2014 06:04:35 -0800

All websites in my university uci.cu are free and i don´t need proxy for access 
its, i only need proxy for website out of this domain, I am testing nutch 1.9 
inside my university and i have use parsechecker tool and it look like if the 
problem only happend with https conection.
Could you give me some advice?


I was looking the DummyX509TrustManager class of protocol-httpclient but i 
don´t understand so much. Maybe it is needed to insert one block for trust in 
websites by default.

  

----- Original Message -----
From: "Markus Jelsma" <[email protected]>
To: [email protected]
Sent: Tuesday, December 16, 2014 8:08:42 AM
Subject: RE: questions about nutch 1.9

Hmm - then maybe you can access it through a proxy that doesnt deal with this 
problem? Then connect Nutch to the proxy.
Markus
 
-----Original message-----
> From:Eyeris RodrIguez Rueda <[email protected]>
> Sent: Tuesday 16th December 2014 14:04
> To: [email protected]
> Subject: Re: questions about nutch 1.9
> 
> Thanks markus and jonathan for your answer.
> I have try with protocol-http only but the problem persist,maybe the solution 
> is a configuration that trust in websites with problem in certificates.
> This is very important for me because i have some websites using https and it 
> is a limitation for use nutch 1.9 in my university.
> 
> 
> 
> 
> ----- Original Message -----
> From: "Markus Jelsma" <[email protected]>
> To: [email protected]
> Sent: Tuesday, December 16, 2014 7:46:54 AM
> Subject: RE: questions about nutch 1.9
> 
> Hi - can you try the protocol-http plugin instead? It has some support for 
> TLS.
>  
> -----Original message-----
> > From:Eyeris RodrIguez Rueda <[email protected]>
> > Sent: Thursday 11th December 2014 22:18
> > To: [email protected]
> > Subject: Re: questions about nutch 1.9
> > 
> > Please any help?
> > 
> > 
> > Hello.
> > I want to use nutch 1.9 but there are some things that i don´t understand 
> > because i was using nutch 1.5.1 before and some things are changed in nutch 
> > 1.9.
> > Sorry if is a basic things.
> > Some questions:
> > 
> > 1- How i can do a crawl process with solr parameter like in nutch 1.5.1 
> > that the spider jump this step if i don´t set solr parameter ?
> > 
> > 2- It is possible to use topN or similar parameter in nutch 1.9 or every 
> > round include all link in crawldb ?
> > 
> > 3- I have activated httpclient plugin and when i crawl a website that use 
> > https protocol i get this error in the output console 
> > *********************************
> > fetch of https://dragones.uci.cu/ failed with: 
> > javax.net.ssl.SSLHandshakeException: 
> > sun.security.validator.ValidatorException: PKIX path building failed: 
> > sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
> > valid certification path to requested target
> > 
> > parsechecker tool throw similar error.
> > 
> > Please any suggestion or advice will be appreciated.
> > 
> > 
> > 
> > ---------------------------------------------------
> > XII Aniversario de la creación de la Universidad de las Ciencias 
> > Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014.
> > 
> 
> 
> ---------------------------------------------------
> XII Aniversario de la creación de la Universidad de las Ciencias 
> Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014.
> 


---------------------------------------------------
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 
12 años de historia junto a Fidel. 12 de diciembre de 2014.

Re: questions about nutch 1.9

Reply via email to