[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415918#comment-16415918 ] Hudson commented on NUTCH-2447: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3513 (See [https://builds.apache.org/job/Nutch-trunk/3513/]) NUTCH-2447 Work-around SSLProtocolException: handshake alert: (snagel: [https://github.com/apache/nutch/commit/c9444e0fcc508a21188b1653f108655eab1211a4]) * (edit) src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java > Work-around SSLProtocolException: handshake alert: unrecognized_name > > > Key: NUTCH-2447 > URL: https://issues.apache.org/jira/browse/NUTCH-2447 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.13 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Critical > Fix For: 1.15 > > Attachments: NUTCH-2447.patch, NUTCH-2447.patch > > > Nutch is unable to crawl some websites, regardless of protocol plugin you are > using. The work-around you frequently find (-Djsse.enableSNIExtension=false) > does not work at all, so the internet is clearly lying to us! > {code} > 2017-10-23 12:43:52,911 INFO api.HttpRobotRulesParser - Couldn't get > robots.txt for https://www.eidsiva.net/: javax.net.ssl.SSLProtocolException: > handshake alert: unrecognized_name > 2017-10-23 12:43:53,011 ERROR http.Http - Failed to get protocol output > javax.net.ssl.SSLProtocolException: handshake alert: unrecognized_name > at > sun.security.ssl.ClientHandshaker.handshakeAlert(ClientHandshaker.java:1446) > at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2016) > at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125) > at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) > at > org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:152) > at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72) > at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271) > at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415822#comment-16415822 ] ASF GitHub Bot commented on NUTCH-2447: --- sebastian-nagel closed pull request #305: NUTCH-2447 Work-around SSLProtocolException: handshake alert: unrecognized_name URL: https://github.com/apache/nutch/pull/305 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java b/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java index 4e75fe89b..c87c11125 100644 --- a/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java +++ b/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java @@ -124,28 +124,29 @@ public HttpResponse(HttpBase http, URL url, CrawlDatum datum) socket.connect(sockAddr, http.getTimeout()); if (scheme == Scheme.HTTPS) { -SSLSocketFactory factory = (SSLSocketFactory) SSLSocketFactory -.getDefault(); -SSLSocket sslsocket = (SSLSocket) factory -.createSocket(socket, sockHost, sockPort, true); -sslsocket.setUseClientMode(true); - -// Get the protocols and ciphers supported by this JVM -Set protocols = new HashSet( -Arrays.asList(sslsocket.getSupportedProtocols())); -Set ciphers = new HashSet( -Arrays.asList(sslsocket.getSupportedCipherSuites())); - -// Intersect with preferred protocols and ciphers -protocols.retainAll(http.getTlsPreferredProtocols()); -ciphers.retainAll(http.getTlsPreferredCipherSuites()); - -sslsocket.setEnabledProtocols( -protocols.toArray(new String[protocols.size()])); -sslsocket.setEnabledCipherSuites( -ciphers.toArray(new String[ciphers.size()])); - -sslsocket.startHandshake(); +SSLSocket sslsocket = null; + +try { + sslsocket = getSSLSocket(socket, sockHost, sockPort); + sslsocket.startHandshake(); +} catch (IOException e) { + Http.LOG.debug("SSL connection to {} failed with: {}", url, + e.getMessage()); + if ("handshake alert: unrecognized_name".equals(e.getMessage())) { +try { + // Reconnect, see NUTCH-2447 + socket = new Socket(); + socket.setSoTimeout(http.getTimeout()); + socket.connect(sockAddr, http.getTimeout()); + sslsocket = getSSLSocket(socket, "", sockPort); + sslsocket.startHandshake(); +} catch (IOException ex) { + String msg = "SSL reconnect to " + url + " failed with: " + + e.getMessage(); + throw new HttpException(msg); +} + } +} socket = sslsocket; } @@ -318,6 +319,31 @@ public Metadata getHeaders() { * - */ + private SSLSocket getSSLSocket(Socket socket, String sockHost, int sockPort) throws IOException { +SSLSocketFactory factory = (SSLSocketFactory) SSLSocketFactory + .getDefault(); +SSLSocket sslsocket = (SSLSocket) factory + .createSocket(socket, sockHost, sockPort, true); +sslsocket.setUseClientMode(true); + +// Get the protocols and ciphers supported by this JVM +Set protocols = new HashSet( + Arrays.asList(sslsocket.getSupportedProtocols())); +Set ciphers = new HashSet( + Arrays.asList(sslsocket.getSupportedCipherSuites())); + +// Intersect with preferred protocols and ciphers +protocols.retainAll(http.getTlsPreferredProtocols()); +ciphers.retainAll(http.getTlsPreferredCipherSuites()); + +sslsocket.setEnabledProtocols( + protocols.toArray(new String[protocols.size()])); +sslsocket.setEnabledCipherSuites( + ciphers.toArray(new String[ciphers.size()])); + +return sslsocket; + } + private void readPlainContent(InputStream in) throws HttpException, IOException { This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Work-around SSLProtocolException: handshake alert: unrecognized_name > > > Key: NUTCH-2447 > URL: https://issues.apache.org/jira/browse/NUTCH-2447 > Project: Nutch > Issue Type: Bug >
[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415671#comment-16415671 ] ASF GitHub Bot commented on NUTCH-2447: --- lewismc commented on issue #305: NUTCH-2447 Work-around SSLProtocolException: handshake alert: unrecognized_name URL: https://github.com/apache/nutch/pull/305#issuecomment-376538896 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Work-around SSLProtocolException: handshake alert: unrecognized_name > > > Key: NUTCH-2447 > URL: https://issues.apache.org/jira/browse/NUTCH-2447 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.13 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Critical > Fix For: 1.15 > > Attachments: NUTCH-2447.patch, NUTCH-2447.patch > > > Nutch is unable to crawl some websites, regardless of protocol plugin you are > using. The work-around you frequently find (-Djsse.enableSNIExtension=false) > does not work at all, so the internet is clearly lying to us! > {code} > 2017-10-23 12:43:52,911 INFO api.HttpRobotRulesParser - Couldn't get > robots.txt for https://www.eidsiva.net/: javax.net.ssl.SSLProtocolException: > handshake alert: unrecognized_name > 2017-10-23 12:43:53,011 ERROR http.Http - Failed to get protocol output > javax.net.ssl.SSLProtocolException: handshake alert: unrecognized_name > at > sun.security.ssl.ClientHandshaker.handshakeAlert(ClientHandshaker.java:1446) > at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2016) > at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125) > at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) > at > org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:152) > at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72) > at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271) > at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413755#comment-16413755 ] ASF GitHub Bot commented on NUTCH-2447: --- sebastian-nagel opened a new pull request #305: NUTCH-2447 Work-around SSLProtocolException: handshake alert: unrecognized_name URL: https://github.com/apache/nutch/pull/305 - apply Markus' patch - if reconnect fails throw HTTPException This solution (work-around) has been tested in production. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Work-around SSLProtocolException: handshake alert: unrecognized_name > > > Key: NUTCH-2447 > URL: https://issues.apache.org/jira/browse/NUTCH-2447 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.13 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Critical > Fix For: 1.15 > > Attachments: NUTCH-2447.patch, NUTCH-2447.patch > > > Nutch is unable to crawl some websites, regardless of protocol plugin you are > using. The work-around you frequently find (-Djsse.enableSNIExtension=false) > does not work at all, so the internet is clearly lying to us! > {code} > 2017-10-23 12:43:52,911 INFO api.HttpRobotRulesParser - Couldn't get > robots.txt for https://www.eidsiva.net/: javax.net.ssl.SSLProtocolException: > handshake alert: unrecognized_name > 2017-10-23 12:43:53,011 ERROR http.Http - Failed to get protocol output > javax.net.ssl.SSLProtocolException: handshake alert: unrecognized_name > at > sun.security.ssl.ClientHandshaker.handshakeAlert(ClientHandshaker.java:1446) > at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2016) > at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125) > at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) > at > org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:152) > at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72) > at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271) > at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351909#comment-16351909 ] Sebastian Nagel commented on NUTCH-2447: Hi [~markus17], problem and solution confirmed. Two questions: * both patch files are identical, according to your comment the second/newer one should point to this issue, right? * why are errors during the reconnect ignored? (try-catch inside the catch block) The connection cannot be used, if the reconnect fails as well? > Work-around SSLProtocolException: handshake alert: unrecognized_name > > > Key: NUTCH-2447 > URL: https://issues.apache.org/jira/browse/NUTCH-2447 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.13 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Critical > Fix For: 1.15 > > Attachments: NUTCH-2447.patch, NUTCH-2447.patch > > > Nutch is unable to crawl some websites, regardless of protocol plugin you are > using. The work-around you frequently find (-Djsse.enableSNIExtension=false) > does not work at all, so the internet is clearly lying to us! > {code} > 2017-10-23 12:43:52,911 INFO api.HttpRobotRulesParser - Couldn't get > robots.txt for https://www.eidsiva.net/: javax.net.ssl.SSLProtocolException: > handshake alert: unrecognized_name > 2017-10-23 12:43:53,011 ERROR http.Http - Failed to get protocol output > javax.net.ssl.SSLProtocolException: handshake alert: unrecognized_name > at > sun.security.ssl.ClientHandshaker.handshakeAlert(ClientHandshaker.java:1446) > at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2016) > at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125) > at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) > at > org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:152) > at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72) > at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271) > at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name
[ https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215057#comment-16215057 ] Markus Jelsma commented on NUTCH-2447: -- As a side note, also pay attention to this incredible ugly looking fix! > Work-around SSLProtocolException: handshake alert: unrecognized_name > > > Key: NUTCH-2447 > URL: https://issues.apache.org/jira/browse/NUTCH-2447 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.13 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Critical > Fix For: 1.14 > > Attachments: NUTCH-2447.patch > > > Nutch is unable to crawl some websites, regardless of protocol plugin you are > using. The work-around you frequently find (-Djsse.enableSNIExtension=false) > does not work at all, so the internet is clearly lying to us! > {code} > 2017-10-23 12:43:52,911 INFO api.HttpRobotRulesParser - Couldn't get > robots.txt for https://www.eidsiva.net/: javax.net.ssl.SSLProtocolException: > handshake alert: unrecognized_name > 2017-10-23 12:43:53,011 ERROR http.Http - Failed to get protocol output > javax.net.ssl.SSLProtocolException: handshake alert: unrecognized_name > at > sun.security.ssl.ClientHandshaker.handshakeAlert(ClientHandshaker.java:1446) > at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2016) > at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125) > at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) > at > org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:152) > at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72) > at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271) > at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)