Thanks for the reply. I'm using 1.6 on Centos 6.3 with Oracle Java 6 and using all of the built-in Hadoop capability. Haven't learned how to run it on my 'real' hadoop cluster yet...
Invocation: bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 5 -topN 5000 Hadoop trace: 2013-03-09 23:07:07,662 WARN mapred.LocalJobRunner - job_local_0016 java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.addThread(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at java.util.concurrent.AbstractExecutorService.submit(Unknown Source) at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:159) at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:93) at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97) at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) I was running it in a small vm with 2 GB or memory. After I posted, I ran the crawler again with 6 GB of memory. I'll try what you suggested and bypass the inject. Thanks, -Kris On Sat, Mar 9, 2013 at 11:36 PM, kiran chitturi <[email protected]>wrote: > Hi Kris, > > Which version are you using ? > > At which step did the exception happen ? Is it after fetch stage or parse > stage? > > Are you using the crawl script(./bin/crawl) or crawl command (./bin/nutch > crawl) to do the crawl ? > > You can use the crawl script located at (./bin/crawl) by removing the > inject step since you would not need injecting the seeds again. > > Please let us know if you have any more questions. > > > > > On Sat, Mar 9, 2013 at 11:22 PM, Kristopher Kane <[email protected] > >wrote: > > > I had a long running session going and would like to try and pick up > where > > it left off if possible. In the terminal, Nutch was at a parsing stage > > then hit OOM. Is there anyway to start that near where it left off? > > > > -Kris > > > > > > -- > Kiran Chitturi >

