[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2018-03-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415918#comment-16415918
 ] 

Hudson commented on NUTCH-2447:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3513 (See 
[https://builds.apache.org/job/Nutch-trunk/3513/])
NUTCH-2447 Work-around SSLProtocolException: handshake alert: (snagel: 
[https://github.com/apache/nutch/commit/c9444e0fcc508a21188b1653f108655eab1211a4])
* (edit) 
src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java


> Work-around SSLProtocolException: handshake alert: unrecognized_name
> 
>
> Key: NUTCH-2447
> URL: https://issues.apache.org/jira/browse/NUTCH-2447
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.13
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Critical
> Fix For: 1.15
>
> Attachments: NUTCH-2447.patch, NUTCH-2447.patch
>
>
> Nutch is unable to crawl some websites, regardless of protocol plugin you are 
> using. The work-around you frequently find (-Djsse.enableSNIExtension=false) 
> does not work at all, so the internet is clearly lying to us!
> {code}
> 2017-10-23 12:43:52,911 INFO  api.HttpRobotRulesParser - Couldn't get 
> robots.txt for https://www.eidsiva.net/: javax.net.ssl.SSLProtocolException: 
> handshake alert:  unrecognized_name
> 2017-10-23 12:43:53,011 ERROR http.Http - Failed to get protocol output
> javax.net.ssl.SSLProtocolException: handshake alert:  unrecognized_name
> at 
> sun.security.ssl.ClientHandshaker.handshakeAlert(ClientHandshaker.java:1446)
> at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2016)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125)
> at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
> at 
> org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:152)
> at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72)
> at 
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271)
> at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415822#comment-16415822
 ] 

ASF GitHub Bot commented on NUTCH-2447:
---

sebastian-nagel closed pull request #305: NUTCH-2447 Work-around 
SSLProtocolException: handshake alert: unrecognized_name
URL: https://github.com/apache/nutch/pull/305
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
 
b/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
index 4e75fe89b..c87c11125 100644
--- 
a/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
+++ 
b/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
@@ -124,28 +124,29 @@ public HttpResponse(HttpBase http, URL url, CrawlDatum 
datum)
   socket.connect(sockAddr, http.getTimeout());
 
   if (scheme == Scheme.HTTPS) {
-SSLSocketFactory factory = (SSLSocketFactory) SSLSocketFactory
-.getDefault();
-SSLSocket sslsocket = (SSLSocket) factory
-.createSocket(socket, sockHost, sockPort, true);
-sslsocket.setUseClientMode(true);
-
-// Get the protocols and ciphers supported by this JVM
-Set protocols = new HashSet(
-Arrays.asList(sslsocket.getSupportedProtocols()));
-Set ciphers = new HashSet(
-Arrays.asList(sslsocket.getSupportedCipherSuites()));
-
-// Intersect with preferred protocols and ciphers
-protocols.retainAll(http.getTlsPreferredProtocols());
-ciphers.retainAll(http.getTlsPreferredCipherSuites());
-
-sslsocket.setEnabledProtocols(
-protocols.toArray(new String[protocols.size()]));
-sslsocket.setEnabledCipherSuites(
-ciphers.toArray(new String[ciphers.size()]));
-
-sslsocket.startHandshake();
+SSLSocket sslsocket = null;
+
+try {
+  sslsocket = getSSLSocket(socket, sockHost, sockPort);
+  sslsocket.startHandshake();
+} catch (IOException e) {
+  Http.LOG.debug("SSL connection to {} failed with: {}", url,
+  e.getMessage());
+  if ("handshake alert:  unrecognized_name".equals(e.getMessage())) {
+try {
+  // Reconnect, see NUTCH-2447
+  socket = new Socket();
+  socket.setSoTimeout(http.getTimeout());
+  socket.connect(sockAddr, http.getTimeout());
+  sslsocket = getSSLSocket(socket, "", sockPort);
+  sslsocket.startHandshake();
+} catch (IOException ex) {
+  String msg = "SSL reconnect to " + url + " failed with: "
+  + e.getMessage();
+  throw new HttpException(msg);
+}
+  }
+}
 socket = sslsocket;
   }
 
@@ -318,6 +319,31 @@ public Metadata getHeaders() {
* -
*/
 
+  private SSLSocket getSSLSocket(Socket socket, String sockHost, int sockPort) 
throws IOException {
+SSLSocketFactory factory = (SSLSocketFactory) SSLSocketFactory
+  .getDefault();
+SSLSocket sslsocket = (SSLSocket) factory
+  .createSocket(socket, sockHost, sockPort, true);
+sslsocket.setUseClientMode(true);
+
+// Get the protocols and ciphers supported by this JVM
+Set protocols = new HashSet(
+  Arrays.asList(sslsocket.getSupportedProtocols()));
+Set ciphers = new HashSet(
+  Arrays.asList(sslsocket.getSupportedCipherSuites()));
+
+// Intersect with preferred protocols and ciphers
+protocols.retainAll(http.getTlsPreferredProtocols());
+ciphers.retainAll(http.getTlsPreferredCipherSuites());
+
+sslsocket.setEnabledProtocols(
+  protocols.toArray(new String[protocols.size()]));
+sslsocket.setEnabledCipherSuites(
+  ciphers.toArray(new String[ciphers.size()]));
+
+return sslsocket;
+  }
+
   private void readPlainContent(InputStream in)
   throws HttpException, IOException {
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Work-around SSLProtocolException: handshake alert: unrecognized_name
> 
>
> Key: NUTCH-2447
> URL: https://issues.apache.org/jira/browse/NUTCH-2447
> Project: Nutch
>  Issue Type: Bug
>  

[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2018-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415671#comment-16415671
 ] 

ASF GitHub Bot commented on NUTCH-2447:
---

lewismc commented on issue #305: NUTCH-2447 Work-around SSLProtocolException: 
handshake alert: unrecognized_name
URL: https://github.com/apache/nutch/pull/305#issuecomment-376538896
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Work-around SSLProtocolException: handshake alert: unrecognized_name
> 
>
> Key: NUTCH-2447
> URL: https://issues.apache.org/jira/browse/NUTCH-2447
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.13
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Critical
> Fix For: 1.15
>
> Attachments: NUTCH-2447.patch, NUTCH-2447.patch
>
>
> Nutch is unable to crawl some websites, regardless of protocol plugin you are 
> using. The work-around you frequently find (-Djsse.enableSNIExtension=false) 
> does not work at all, so the internet is clearly lying to us!
> {code}
> 2017-10-23 12:43:52,911 INFO  api.HttpRobotRulesParser - Couldn't get 
> robots.txt for https://www.eidsiva.net/: javax.net.ssl.SSLProtocolException: 
> handshake alert:  unrecognized_name
> 2017-10-23 12:43:53,011 ERROR http.Http - Failed to get protocol output
> javax.net.ssl.SSLProtocolException: handshake alert:  unrecognized_name
> at 
> sun.security.ssl.ClientHandshaker.handshakeAlert(ClientHandshaker.java:1446)
> at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2016)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125)
> at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
> at 
> org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:152)
> at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72)
> at 
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271)
> at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413755#comment-16413755
 ] 

ASF GitHub Bot commented on NUTCH-2447:
---

sebastian-nagel opened a new pull request #305: NUTCH-2447 Work-around 
SSLProtocolException: handshake alert: unrecognized_name
URL: https://github.com/apache/nutch/pull/305
 
 
   - apply Markus' patch
   - if reconnect fails throw HTTPException
   
   This solution (work-around) has been tested in production.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Work-around SSLProtocolException: handshake alert: unrecognized_name
> 
>
> Key: NUTCH-2447
> URL: https://issues.apache.org/jira/browse/NUTCH-2447
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.13
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Critical
> Fix For: 1.15
>
> Attachments: NUTCH-2447.patch, NUTCH-2447.patch
>
>
> Nutch is unable to crawl some websites, regardless of protocol plugin you are 
> using. The work-around you frequently find (-Djsse.enableSNIExtension=false) 
> does not work at all, so the internet is clearly lying to us!
> {code}
> 2017-10-23 12:43:52,911 INFO  api.HttpRobotRulesParser - Couldn't get 
> robots.txt for https://www.eidsiva.net/: javax.net.ssl.SSLProtocolException: 
> handshake alert:  unrecognized_name
> 2017-10-23 12:43:53,011 ERROR http.Http - Failed to get protocol output
> javax.net.ssl.SSLProtocolException: handshake alert:  unrecognized_name
> at 
> sun.security.ssl.ClientHandshaker.handshakeAlert(ClientHandshaker.java:1446)
> at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2016)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125)
> at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
> at 
> org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:152)
> at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72)
> at 
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271)
> at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2018-02-04 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351909#comment-16351909
 ] 

Sebastian Nagel commented on NUTCH-2447:


Hi [~markus17], problem and solution confirmed. Two questions:
 * both patch files are identical, according to your comment the second/newer 
one should point to this issue, right?
 * why are errors during the reconnect ignored? (try-catch inside the catch 
block) The connection cannot be used, if the reconnect fails as well?

> Work-around SSLProtocolException: handshake alert: unrecognized_name
> 
>
> Key: NUTCH-2447
> URL: https://issues.apache.org/jira/browse/NUTCH-2447
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.13
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Critical
> Fix For: 1.15
>
> Attachments: NUTCH-2447.patch, NUTCH-2447.patch
>
>
> Nutch is unable to crawl some websites, regardless of protocol plugin you are 
> using. The work-around you frequently find (-Djsse.enableSNIExtension=false) 
> does not work at all, so the internet is clearly lying to us!
> {code}
> 2017-10-23 12:43:52,911 INFO  api.HttpRobotRulesParser - Couldn't get 
> robots.txt for https://www.eidsiva.net/: javax.net.ssl.SSLProtocolException: 
> handshake alert:  unrecognized_name
> 2017-10-23 12:43:53,011 ERROR http.Http - Failed to get protocol output
> javax.net.ssl.SSLProtocolException: handshake alert:  unrecognized_name
> at 
> sun.security.ssl.ClientHandshaker.handshakeAlert(ClientHandshaker.java:1446)
> at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2016)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125)
> at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
> at 
> org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:152)
> at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72)
> at 
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271)
> at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2447) Work-around SSLProtocolException: handshake alert: unrecognized_name

2017-10-23 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215057#comment-16215057
 ] 

Markus Jelsma commented on NUTCH-2447:
--

As a side note, also pay attention to this incredible ugly looking fix!

> Work-around SSLProtocolException: handshake alert: unrecognized_name
> 
>
> Key: NUTCH-2447
> URL: https://issues.apache.org/jira/browse/NUTCH-2447
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.13
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Critical
> Fix For: 1.14
>
> Attachments: NUTCH-2447.patch
>
>
> Nutch is unable to crawl some websites, regardless of protocol plugin you are 
> using. The work-around you frequently find (-Djsse.enableSNIExtension=false) 
> does not work at all, so the internet is clearly lying to us!
> {code}
> 2017-10-23 12:43:52,911 INFO  api.HttpRobotRulesParser - Couldn't get 
> robots.txt for https://www.eidsiva.net/: javax.net.ssl.SSLProtocolException: 
> handshake alert:  unrecognized_name
> 2017-10-23 12:43:53,011 ERROR http.Http - Failed to get protocol output
> javax.net.ssl.SSLProtocolException: handshake alert:  unrecognized_name
> at 
> sun.security.ssl.ClientHandshaker.handshakeAlert(ClientHandshaker.java:1446)
> at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2016)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1125)
> at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
> at 
> org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:152)
> at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72)
> at 
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:271)
> at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:327)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)