org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates
--------------------------------------------------------------------------
Key: NUTCH-148
URL: http://issues.apache.org/jira/browse/NUTCH-148
Project: Nutch
Type: Bug
Components: indexer
Versions: 0.8-dev
Environment: Windows XP Home
Reporter: raghavendra prabhu
I get the following error while running org.apache.nutch.tools.CrawlTool
The error actually is in deleteduplicates
51223 001121 Reading url hashes...
051223 001121 Sorting url hashes...
051223 001121 Deleting url duplicates...
051223 001121 Error moving bad file
G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF
\classes\ddup-workingdir\ddup-20051223001121: java.io.IOException:
CreateProcess
: df -k
G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121
error=2
It throws the error here in NFSDataInputStream.java
The exception is org.apache.nutch.fs.ChecksumException: Checksum
error: G:\apach
e-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121
at 0
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers