On 8/10/2010 12:55 PM, webdev1977 wrote:
Wow.. this is very frustrating! I just downloaded and configured the 1.2
tagged version from SVN and I STILL can not complete a file system crawl
using the nutch crawl command.
Has anyone been able to complete a crawl using the nutch crawl command and
using the file: protocol? I have a very very large shared drive that I am
crawling (300,000 + files).
I have very little memory to use, about 2GB total. I am running this as a
prototype on my Win XP box.
Any ideas based on the stack trace what might be causing this?
--------hadoop.log Snippet
------------------------------------------------------------
2010-08-10 13:16:03,438 WARN mapred.LocalJobRunner - job_local_0025
java.lang.OutOfMemoryError
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(Unknown Source)
at
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.read(RawLocalFileSystem.java:83)
at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:136)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at java.io.DataInputStream.read(Unknown Source)
at
org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:149)
at
org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)
at org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:328)
at org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:358)
at
org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:342)
at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:404)
at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
at
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330)
at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
at
org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:973)
at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:932)
at
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:241)
at
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:237)
at
org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:42)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2010-08-10 13:16:03,672 INFO mapred.JobClient - Job complete:
job_local_0025
2010-08-10 13:16:03,672 INFO mapred.JobClient - Counters: 17
2010-08-10 13:16:03,672 INFO mapred.JobClient - ParserStatus
2010-08-10 13:16:03,672 INFO mapred.JobClient - failed=59
2010-08-10 13:16:03,672 INFO mapred.JobClient - success=905
2010-08-10 13:16:03,672 INFO mapred.JobClient - FileSystemCounters
2010-08-10 13:16:03,672 INFO mapred.JobClient -
FILE_BYTES_READ=19515258622
2010-08-10 13:16:03,672 INFO mapred.JobClient -
FILE_BYTES_WRITTEN=25431386296
2010-08-10 13:16:03,672 INFO mapred.JobClient - FetcherStatus
2010-08-10 13:16:03,672 INFO mapred.JobClient - exception=34
2010-08-10 13:16:03,672 INFO mapred.JobClient - success=964
2010-08-10 13:16:03,672 INFO mapred.JobClient - Map-Reduce Framework
2010-08-10 13:16:03,672 INFO mapred.JobClient - Reduce input groups=260
2010-08-10 13:16:03,672 INFO mapred.JobClient - Combine output
records=0
2010-08-10 13:16:03,672 INFO mapred.JobClient - Map input records=1000
2010-08-10 13:16:03,672 INFO mapred.JobClient - Reduce shuffle bytes=0
2010-08-10 13:16:03,672 INFO mapred.JobClient - Reduce output
records=741
2010-08-10 13:16:03,672 INFO mapred.JobClient - Spilled Records=5856
2010-08-10 13:16:03,672 INFO mapred.JobClient - Map output
bytes=309514931
2010-08-10 13:16:03,672 INFO mapred.JobClient - Map input bytes=145708
2010-08-10 13:16:03,672 INFO mapred.JobClient - Combine input records=0
2010-08-10 13:16:03,672 INFO mapred.JobClient - Map output records=2928
2010-08-10 13:16:03,672 INFO mapred.JobClient - Reduce input
records=742
You ran out of memory; give Java more heap space. What is it now? Try
giving it as much more as you can.