Are you running in Win2k, Windows XP, Windows Server? 
Do you have virus scanner on? Do you have anyfirewall
software enabled? Anything blocking ports?

Do you use NDFS or local?

Are you on NTFS or FAT32 file system?

How large is the dataset you are working with? Have
you split into more smaller jobs instead of big/large
jobs?

--- Nguyen Ngoc Giang <[EMAIL PROTECTED]> wrote:

>   Hi all,
> 
>   I'd like to bring back this topic, which has been
> ignored several times in
> Nutch mailing list as well as JIRA (
> http://issues.apache.org/jira/browse/NUTCH-94,
> http://issues.apache.org/jira/browse/NUTCH-96,
> http://issues.apache.org/jira/browse/NUTCH-117 ).
> Here is my error stack:
> 
> 060104 110314 Finishing update
> 060104 110314 Processing pagesByURL: Sorted 11
> instructions in 0.016seconds.
> 060104 110314 Processing pagesByURL: Sorted 687.5
> instructions/second
> java.io.IOException: already exists:
> C:\tomcat\webapps\ROOT\data\db\webdb.new\pagesByURL
>         at
>
org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:86)
>         at
>
org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(
> WebDBWriter.java:549)
>         at
>
org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
>         at
> org.apache.nutch.tools.UpdateDatabaseTool.close(
> UpdateDatabaseTool.java:375)
> 
> 
>   This error happens not only at update time, but
> also at fetchlist time.
> And the weird thing is that it happens so
> undeterministically. I debugged
> around and it seems the problem is because some
> CloseProcessors didn't
> terminate correctly, causing the webdb.new not
> deletable. Then I try to
> reduce to only 1 thread, with lightweight load (as
> suggested in the JIRA
> discussion), but it doesn't help. But when I try to
> run step by step using
> debugging mode of the IDE, there was no problem.
> 
>   Can anyone help me to figure out this issue?
> Thanks very much.
> 
>   Regards,
>   Giang
> 



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to