hi I am setting up a new crawler with Nutch 1.15 and am having problems only with Wordpress.com hosted sites
I can crawl other https sites no problems Wordpress sites can be crawled on other hosts, but I think there is a problem with the SSL certs at Wordpress.com I get this error FetcherThread 43 fetch of https://whatdavidread.ca/ failed with: org.apache.commons.httpclient.NoHttpResponseException: The server whatdavidread.ca failed to respond FetcherThread 43 has no more work available there seems to be two layers of SSL certs first there is a Letsencrypt cert, with many domains, including the one above, and the tls.auttomatic.com domain then, underlying the Lets Encrypt cert, there is a *.wordpress.com cert from Comodo Certificate chain 0 s:/OU=Domain Control Validated/OU=EssentialSSL Wildcard/CN=*. wordpress.com i:/C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Domain Validation Secure Server CA I can crawl other https sites no problems I have tried the NUTCH_OPTS=($NUTCH_OPTS -Dhadoop.log.dir="$NUTCH_LOG_DIR" -Djsse.enableSNIExtension=false) and no joy my nutch-site.xml <property> <name>plugin.includes</name> <value>protocol-http|protocol-httpclient|urlfilter-(regex|validator)|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)|urlfilter-domainblacklist</value> <description> </description> </property> thanks for the consideration -- Nicholas Roberts www.niccolox.org