Great! Just thought i would point out in case you missed :)
On Sun, Mar 10, 2013 at 11:30 PM, Kristopher Kane <[email protected]>wrote: > Right, I'm running the script this time around based on your first reply. > > -Kris > > > On Sun, Mar 10, 2013 at 11:13 PM, kiran chitturi > <[email protected]>wrote: > > > Hi Kris, > > > > It was discussed several times in this thread that crawl command should > be > > deprecated and instead, the crawl script present in bin directory > > (./bin/crawl) should be used. [0] > > > > The crawl script does a step by step procedure unlike crawl command. It > is > > recommended to use crawl script. > > > > [0] - https://issues.apache.org/jira/browse/NUTCH-1087 > > > > > > On Sun, Mar 10, 2013 at 10:24 PM, Kristopher Kane <[email protected] > > >wrote: > > > > > Thanks for the reply. I'm using 1.6 on Centos 6.3 with Oracle Java 6 > and > > > using all of the built-in Hadoop capability. Haven't learned how to > run > > it > > > on my 'real' hadoop cluster yet... > > > > > > Invocation: bin/nutch crawl urls -solr > http://localhost:8983/solr/-depth > > > 5 -topN 5000 > > > > > > Hadoop trace: > > > > > > 2013-03-09 23:07:07,662 WARN mapred.LocalJobRunner - job_local_0016 > > > java.lang.OutOfMemoryError: unable to create new native thread > > > at java.lang.Thread.start0(Native Method) > > > at java.lang.Thread.start(Unknown Source) > > > at java.util.concurrent.ThreadPoolExecutor.addThread(Unknown > > > Source) > > > at > > > > java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(Unknown > > > Source) > > > at java.util.concurrent.ThreadPoolExecutor.execute(Unknown > > Source) > > > at java.util.concurrent.AbstractExecutorService.submit(Unknown > > > Source) > > > at > org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:159) > > > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:93) > > > at > org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97) > > > at > org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44) > > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > > at > > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > > > at > > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > > > > > > I was running it in a small vm with 2 GB or memory. After I posted, I > ran > > > the crawler again with 6 GB of memory. > > > > > > I'll try what you suggested and bypass the inject. > > > > > > Thanks, > > > > > > -Kris > > > > > > > > > > > > On Sat, Mar 9, 2013 at 11:36 PM, kiran chitturi > > > <[email protected]>wrote: > > > > > > > Hi Kris, > > > > > > > > Which version are you using ? > > > > > > > > At which step did the exception happen ? Is it after fetch stage or > > parse > > > > stage? > > > > > > > > Are you using the crawl script(./bin/crawl) or crawl command > > (./bin/nutch > > > > crawl) to do the crawl ? > > > > > > > > You can use the crawl script located at (./bin/crawl) by removing the > > > > inject step since you would not need injecting the seeds again. > > > > > > > > Please let us know if you have any more questions. > > > > > > > > > > > > > > > > > > > > On Sat, Mar 9, 2013 at 11:22 PM, Kristopher Kane < > [email protected] > > > > >wrote: > > > > > > > > > I had a long running session going and would like to try and pick > up > > > > where > > > > > it left off if possible. In the terminal, Nutch was at a parsing > > stage > > > > > then hit OOM. Is there anyway to start that near where it left > off? > > > > > > > > > > -Kris > > > > > > > > > > > > > > > > > > > > > -- > > > > Kiran Chitturi > > > > > > > > > > > > > > > -- > > Kiran Chitturi > > > -- Kiran Chitturi

