Hi there, I am testing Nutch against a blog. https://datafireball.com/
I added the link to the seed.txt and left the regex-urlfilter the way it is. I replaced protocol-http with protocol-httpclient and thought that will make it capable of fetching https links. However, it failed with the following error after I executed the crawl command: $ bin/crawl urls/ crawldir 3 fetcher.maxNum.threads can't be < than 50 : using 50 instead robots.txt whitelist not configured. -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1 fetch of https://datafireball.com/ failed with: org.apache.commons.httpclient.NoHttpResponseException: The server datafireball.com failed to respond Thread FetcherThread has no more work available -finishing thread FetcherThread, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0 -activeThreads=0 I am pretty positive that the blog was functioning really well but couldn't really get that much help from the internet. Can anyone give me some guide. Below is the nutch-site.xml that I was using. Best regards, Bin <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>http.agent.name</name> <value>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36</value> </property> <property> <name>db.ignore.internal.links</name> <value>false</value> </property> <property> <name>plugin.includes</name> <value>protocol-httpclient|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value> </property> <property> <name>http.content.limit</name> <value>-1</value> </property> <property> <name>fetcher.server.delay</name> <value>0</value> </property> <property> <name>http.redirect.max</name> <value>5</value> </property> <property> <name>db.max.anchor.length</name> <value>1000</value> </property> </configuration>

