Maybe I have found a solution. The problem is in the integration. I am trying
to integrate Nutch 2.2.1 and HtmlUnit 2.12 because I am working on a video
and podcasts crawling... so I need a rendered source code for every single
page I want to crawl. And this is the pain... it fails (I think) because of
library conflicts (between httpclient-4.2.5.jar and htmlunit-2.12.jar)
Caused by: java.lang.RuntimeException: java.lang.NoSuchMethodException:
org.apache.http.conn.ssl.SSLSocketFactory.createDefaultSSLContext()
at
com.gargoylesoftware.htmlunit.HtmlUnitSSLSocketFactory.createSSLContext(HtmlUnitSSLSocketFactory.java:119)
at
com.gargoylesoftware.htmlunit.HtmlUnitSSLSocketFactory.<init>(HtmlUnitSSLSocketFactory.java:102)
at
com.gargoylesoftware.htmlunit.HtmlUnitSSLSocketFactory.buildSSLSocketFactory(HtmlUnitSSLSocketFactory.java:77)
at
com.gargoylesoftware.htmlunit.HttpWebConnection.configureHttpsScheme(HttpWebConnection.java:608)
at
com.gargoylesoftware.htmlunit.HttpWebConnection.createHttpClient(HttpWebConnection.java:555)
at
com.gargoylesoftware.htmlunit.HttpWebConnection.getHttpClient(HttpWebConnection.java:518)
at
com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:155)
at
com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1486)
at
com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1403)
at
com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:305)
at
com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:374)
at
com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:359)
--
View this message in context:
http://lucene.472066.n3.nabble.com/New-script-bin-crawl-skipping-urls-different-batch-id-XXXXXXXX-YYYYYYYYY-tp4075441p4075805.html
Sent from the Nutch - User mailing list archive at Nabble.com.