On Mon, 2005-10-03 at 15:57 -0700, Doug Cutting wrote:
> Try the following on your system:
> 
> bin/nutch org.apache.nutch.io.TestSequenceFile -fast -count 20000000 
> -megabytes 100 foo
> 
> Tell me how it behaves during the sort phase.

We changed interpreters on some machines to see if that would help.

Sun 1.5.0:
        3MB to 5MB/sec steady writes. Takes 9.5 minutes with a 100MB
        file and 8.5 minutes with a 500MB file.

The IBM 1.4.2 interpreter I was using at with the initial repeatably
receives these errors, although I cannot find them in the logs. Perhaps
the nutch catches them and retries?

051004 105012 running merge pass=1
Exception in thread "main" java.io.IOException: Cannot seek after EOF
        at org.apache.nutch.ndfs.NDFSClient
$NDFSInputStream.seek(NDFSClient.java:445)
        at org.apache.nutch.fs.NFSDataInputStream
$PositionCache.seek(NFSDataInputStream.java:43)
        at org.apache.nutch.fs.NFSDataInputStream
$Buffer.seek(NFSDataInputStream.java:68)
        at
org.apache.nutch.fs.NFSDataInputStream.seek(NFSDataInputStream.java:95)
        at org.apache.nutch.io.SequenceFile$Sorter
$MergePass.run(SequenceFile.java:720)
        at org.apache.nutch.io.SequenceFile
$Sorter.mergePass(SequenceFile.java:664)
        at org.apache.nutch.io.SequenceFile
$Sorter.sort(SequenceFile.java:490)
        at
org.apache.nutch.io.TestSequenceFile.sortTest(TestSequenceFile.java:112)
        at
org.apache.nutch.io.TestSequenceFile.main(TestSequenceFile.java:272)


-- 
Rod Taylor <[EMAIL PROTECTED]>

Reply via email to