On Mon, 2005-10-03 at 15:57 -0700, Doug Cutting wrote:
> Try the following on your system:
>
> bin/nutch org.apache.nutch.io.TestSequenceFile -fast -count 20000000
> -megabytes 100 foo
>
> Tell me how it behaves during the sort phase.
We changed interpreters on some machines to see if that would help.
Sun 1.5.0:
3MB to 5MB/sec steady writes. Takes 9.5 minutes with a 100MB
file and 8.5 minutes with a 500MB file.
The IBM 1.4.2 interpreter I was using at with the initial repeatably
receives these errors, although I cannot find them in the logs. Perhaps
the nutch catches them and retries?
051004 105012 running merge pass=1
Exception in thread "main" java.io.IOException: Cannot seek after EOF
at org.apache.nutch.ndfs.NDFSClient
$NDFSInputStream.seek(NDFSClient.java:445)
at org.apache.nutch.fs.NFSDataInputStream
$PositionCache.seek(NFSDataInputStream.java:43)
at org.apache.nutch.fs.NFSDataInputStream
$Buffer.seek(NFSDataInputStream.java:68)
at
org.apache.nutch.fs.NFSDataInputStream.seek(NFSDataInputStream.java:95)
at org.apache.nutch.io.SequenceFile$Sorter
$MergePass.run(SequenceFile.java:720)
at org.apache.nutch.io.SequenceFile
$Sorter.mergePass(SequenceFile.java:664)
at org.apache.nutch.io.SequenceFile
$Sorter.sort(SequenceFile.java:490)
at
org.apache.nutch.io.TestSequenceFile.sortTest(TestSequenceFile.java:112)
at
org.apache.nutch.io.TestSequenceFile.main(TestSequenceFile.java:272)
--
Rod Taylor <[EMAIL PROTECTED]>