When I ran "bin/nutch generate db segments -topN 50000" it got this
error message.
060118 191222 Processing segments/20060118191140/fetchlist.unsorted:
Sorted 5779.678649867067 entries/second
060118 191222 Overall processing: Sorted 50000 entries in 8.651 seconds.
060118 191222 Overall processing: Sorted 1.7302E-4 entries/second
Exception in thread "main" java.io.IOException: File already
exists:db/webdb/linksByMD5/data
at org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:135)
at org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:102)
at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:57)
at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:78)
at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:78)
at org.apache.nutch.fs.LocalFileSystem.rename(LocalFileSystem.java:149)
at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1676)
at
org.apache.nutch.tools.FetchListTool.emitFetchList(FetchListTool.java:499)
at
org.apache.nutch.tools.FetchListTool.emitFetchList(FetchListTool.java:319)
at org.apache.nutch.tools.FetchListTool.main(FetchListTool.java:593)
The only thing I can think of is that the db.default.fetch.interval has
expired so sites will bee re-fetched. Any need to worry?
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general