Thanks for all your help.. I applied that patch and I also added the property
that Brad described.  I am not receiving an out of memory error:

Reading content of SMB directory: 19A475BB-A31E-473A-BD05-62FA081F20F7/
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1104)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:133)

Any ways to mitigate this?  I get through aobut 900 or so documents when
this occurs.  On linux the free command shows that I only have 7mb of free
memory!  I have set the java heap size to 512 and it gets a bit further, but
still dies at some point during the fetch process.  Any other places I can
restrict memory?  I only have 1GB to work with on this box.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/File-System-Crawling-tp963557p966308.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to