Hi, guys
Yesterday, I tried to crawl a website (a Chinese website) with some seed
links like this:
http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml
but the crawl process failed because of a problem shown as following:
fetching http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml (queue crawl
delay=5000ms)
fetch of http://www.ccgp.gov.cn/cggg/dfbx/gkzb/default_4.shtml failed with:
java.io.IOException: unzipBestEffort returned null
At first, I used nutch-1.5.1 to crawl the website and had the above problem,
then I changed to use nutch-1.7 to do it again but it failed again.
Now, I totally have no idea how to handle the problem!
I would really appreciate any feedback!
-Yan Wang