All websites in my university uci.cu are free and i don´t need proxy for access its, i only need proxy for website out of this domain, I am testing nutch 1.9 inside my university and i have use parsechecker tool and it look like if the problem only happend with https conection. Could you give me some advice?
I was looking the DummyX509TrustManager class of protocol-httpclient but i don´t understand so much. Maybe it is needed to insert one block for trust in websites by default. ----- Original Message ----- From: "Markus Jelsma" <[email protected]> To: [email protected] Sent: Tuesday, December 16, 2014 8:08:42 AM Subject: RE: questions about nutch 1.9 Hmm - then maybe you can access it through a proxy that doesnt deal with this problem? Then connect Nutch to the proxy. Markus -----Original message----- > From:Eyeris RodrIguez Rueda <[email protected]> > Sent: Tuesday 16th December 2014 14:04 > To: [email protected] > Subject: Re: questions about nutch 1.9 > > Thanks markus and jonathan for your answer. > I have try with protocol-http only but the problem persist,maybe the solution > is a configuration that trust in websites with problem in certificates. > This is very important for me because i have some websites using https and it > is a limitation for use nutch 1.9 in my university. > > > > > ----- Original Message ----- > From: "Markus Jelsma" <[email protected]> > To: [email protected] > Sent: Tuesday, December 16, 2014 7:46:54 AM > Subject: RE: questions about nutch 1.9 > > Hi - can you try the protocol-http plugin instead? It has some support for > TLS. > > -----Original message----- > > From:Eyeris RodrIguez Rueda <[email protected]> > > Sent: Thursday 11th December 2014 22:18 > > To: [email protected] > > Subject: Re: questions about nutch 1.9 > > > > Please any help? > > > > > > Hello. > > I want to use nutch 1.9 but there are some things that i don´t understand > > because i was using nutch 1.5.1 before and some things are changed in nutch > > 1.9. > > Sorry if is a basic things. > > Some questions: > > > > 1- How i can do a crawl process with solr parameter like in nutch 1.5.1 > > that the spider jump this step if i don´t set solr parameter ? > > > > 2- It is possible to use topN or similar parameter in nutch 1.9 or every > > round include all link in crawldb ? > > > > 3- I have activated httpclient plugin and when i crawl a website that use > > https protocol i get this error in the output console > > ********************************* > > fetch of https://dragones.uci.cu/ failed with: > > javax.net.ssl.SSLHandshakeException: > > sun.security.validator.ValidatorException: PKIX path building failed: > > sun.security.provider.certpath.SunCertPathBuilderException: unable to find > > valid certification path to requested target > > > > parsechecker tool throw similar error. > > > > Please any suggestion or advice will be appreciated. > > > > > > > > --------------------------------------------------- > > XII Aniversario de la creación de la Universidad de las Ciencias > > Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014. > > > > > --------------------------------------------------- > XII Aniversario de la creación de la Universidad de las Ciencias > Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014. > --------------------------------------------------- XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014.

