Thanks for the reply.  I'm using 1.6 on Centos 6.3 with Oracle Java 6 and
using all of the built-in Hadoop capability.  Haven't learned how to run it
on my 'real' hadoop cluster yet...

Invocation:  bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth
5 -topN 5000

Hadoop trace:

2013-03-09 23:07:07,662 WARN  mapred.LocalJobRunner - job_local_0016
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.addThread(Unknown Source)
        at
java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
        at java.util.concurrent.AbstractExecutorService.submit(Unknown
Source)
        at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:159)
        at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:93)
        at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97)
        at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

I was running it in a small vm with 2 GB or memory. After I posted, I ran
the crawler again with 6 GB of memory.

I'll try what you suggested and bypass the inject.

Thanks,

-Kris



On Sat, Mar 9, 2013 at 11:36 PM, kiran chitturi
<[email protected]>wrote:

> Hi Kris,
>
> Which version are you using ?
>
> At which step did the exception happen ? Is it after fetch stage or parse
> stage?
>
> Are you using the crawl script(./bin/crawl) or crawl command (./bin/nutch
> crawl) to do the crawl ?
>
> You can use the crawl script located at (./bin/crawl) by removing the
> inject step since you would not need injecting the seeds again.
>
> Please let us know if you have any more questions.
>
>
>
>
> On Sat, Mar 9, 2013 at 11:22 PM, Kristopher Kane <[email protected]
> >wrote:
>
> > I had a long running session going and would like to try and pick up
> where
> > it left off if possible.  In the terminal, Nutch was at a parsing stage
> > then hit OOM.  Is there anyway to start that near where it left off?
> >
> > -Kris
> >
>
>
>
> --
> Kiran Chitturi
>

Reply via email to