Hi there,

 

Good day!

 

We would like to crawl the web data by executing the Nutch with Selenium
plugin with the following command:

 

$ nutch plugin protocol-selenium org.apache.nutch.protocol.selenium.Http
https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial

 

However, it failed with the following error message:

 

2021-10-26 19:07:53,961 INFO  selenium.Http - http.proxy.host = xxx.xx.xx.xx

2021-10-26 19:07:53,962 INFO  selenium.Http - http.proxy.port = xxxx

2021-10-26 19:07:53,962 INFO  selenium.Http - http.proxy.exception.list =
true

2021-10-26 19:07:53,962 INFO  selenium.Http - http.timeout = 10000

2021-10-26 19:07:53,962 INFO  selenium.Http - http.content.limit = 1048576

2021-10-26 19:07:53,962 INFO  selenium.Http - http.agent = Apache Nutch
Test/Nutch-1.18

2021-10-26 19:07:53,962 INFO  selenium.Http - http.accept.language =
en-us,en-gb,en;q=0.7,*;q=0.3

2021-10-26 19:07:53,962 INFO  selenium.Http - http.accept =
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

2021-10-26 19:07:53,962 INFO  selenium.Http - http.enable.cookie.header =
true

2021-10-26 19:07:54,114 ERROR selenium.Http - Failed to get protocol output

javax.net.ssl.SSLHandshakeException: Remote host closed connection during
handshake

        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:994)

        at sun.security.ssl.SSL

 

FYI, we have tried the following approaches but the issues persisted.

 

1. Set the http.tls.certificates.check to false

2. Import the website's certificates to our java truststores

3. Our Nutch is configured with proxy

 

Kindly advise. Thanks in advance!

 

 

Best Regards,

Shi Wei

 

Reply via email to