Hi all, I encountered problems when I run nutch 0.7 and 0.7.1 crawler. Although I have added a number of root url in a plain text file *urls *as it the crawler seems unwillingly to fetch any of the urls. However, when In fall back to the nutch 0.6, everything just works fine under it. Therefore, I wondering if this problem happen to all of you? Currently, I am running nutch 0.7.1 with JDK1.5 update 6 on Ubuntu 5.10. Anywhere I came across the same problem under my apple Mac too. Below are the content of the log of the crawler, it shows that the crawler returrns 0 entry. Thanks in advance.
051227 212142 parsing file:/opt/nutch-0.7.1/conf/nutch-default.xml 051227 212143 parsing file:/opt/nutch-0.7.1/conf/crawl-tool.xml 051227 212143 parsing file:/opt/nutch-0.7.1/conf/nutch-site.xml 051227 212143 No FS indicated, using default:local 051227 212143 crawl started in: crawl.test 051227 212143 rootUrlFile = urls 051227 212143 threads = 10 051227 212143 depth = 3 ... ... ..051227 212143 *Added 0 pages* 051227 212143 FetchListTool started 051227 212144 *Overall processing: Sorted 0 entries in 0.0 seconds. *051227 212144 Overall processing: Sorted NaN entries/second 051227 212144 FetchListTool completed 051227 212144 logging at INFO 051227 212145 Updating /opt/nutch-0.7.1/crawl.test/db 051227 212145 Updating for /opt/nutch-0.7.1 /crawl.test/segments/20051227212143 051227 212145 Finishing update 051227 212145 Update finished 051227 212145 FetchListTool started *051227 212145 Overall processing: Sorted 0 entries in 0.0 seconds.* 051227 212145 Overall processing: Sorted NaN entries/second 051227 212145 FetchListTool completed 051227 212145 logging at INFO 051227 212146 Updating /opt/nutch-0.7.1/crawl.test/db 051227 212146 Updating for /opt/nutch-0.7.1 /crawl.test/segments/20051227212145 051227 212146 Finishing update 051227 212146 Update finished 051227 212146 FetchListTool started 051227 212146 Overall processing: Sorted 0 entries in 0.0 seconds. 051227 212146 Overall processing: Sorted NaN entries/second 051227 212146 FetchListTool completed 051227 212146 logging at INFO 051227 212147 Updating /opt/nutch-0.7.1/crawl.test/db 051227 212147 Updating for /opt/nutch-0.7.1 /crawl.test/segments/20051227212146 051227 212147 Finishing update 051227 212147 Update finished 051227 212147 Updating /opt/nutch-0.7.1/crawl.test/segments from /opt/nutch- 0.7.1/crawl.test/db 051227 212147 reading /opt/nutch-0.7.1/crawl.test/segments/20051227212143 051227 212148 reading /opt/nutch-0.7.1/crawl.test/segments/20051227212145 051227 212148 reading /opt/nutch-0.7.1/crawl.test/segments/20051227212146 051227 212148 Sorting pages by url... 051227 212148 Getting updated scores and anchors from db... 051227 212148 Sorting updates by segment... 051227 212148 Updating segments... 051227 212148 Done updating /opt/nutch-0.7.1/crawl.test/segments from /opt/nutch-0.7.1/crawl.test/db 051227 212148 indexing segment: /opt/nutch-0.7.1 /crawl.test/segments/20051227212143 051227 212148 * Opening segment 20051227212143 051227 212148 * Indexing segment 20051227212143 051227 212148 * Optimizing index... 051227 212148 * Moving index to NFS if needed... 051227 212148 DONE indexing segment 20051227212143: total 0 records in 0.026s (NaN rec/s). 051227 212148 done indexing 051227 212148 indexing segment: /opt/nutch-0.7.1 /crawl.test/segments/20051227212145 051227 212148 * Opening segment 20051227212145 051227 212148 * Indexing segment 20051227212145 051227 212148 * Optimizing index... 051227 212148 * Moving index to NFS if needed... 051227 212148 DONE indexing segment 20051227212145: total 0 records in 0.075s (NaN rec/s). 051227 212148 done indexing 051227 212148 indexing segment: /opt/nutch-0.7.1 /crawl.test/segments/20051227212146 051227 212148 * Opening segment 20051227212146 051227 212148 * Indexing segment 20051227212146 051227 212148 * Optimizing index... 051227 212148 * Moving index to NFS if needed... *051227 212148 DONE indexing segment 20051227212146: total 0 records in 0.011 s (NaN rec/s). *051227 212148 done indexing 051227 212148 Reading url hashes... 051227 212148 Sorting url hashes... 051227 212148 Deleting url duplicates... 051227 212148 Deleted 0 url duplicates. 051227 212148 Reading content hashes... 051227 212148 Sorting content hashes... 051227 212148 Deleting content duplicates... 051227 212148 Deleted 0 content duplicates. 051227 212148 Duplicate deletion complete locally. Now returning to NFS... 051227 212148 DeleteDuplicates complete 051227 212148 Merging segment indexes... 051227 212148 crawl finished: crawl.test Rgds Chih-How Bong
