Sorry, i am looking to crawl 400k documents with the crawl. I said 400 in my last message.
On Sat, Mar 2, 2013 at 2:12 PM, kiran chitturi <[email protected]>wrote: > Hi! > > I am running Nutch 1.6 on a 4 GB Mac OS desktop with Core i5 2.8GHz. > > Last night i started a crawl on local mode for 5 seeds with the config > given below. If the crawl goes well, it should fetch a total of 400 > documents. The crawling is done on a single host that we own. > > Config > --------------------- > > fetcher.threads.per.queue - 2 > fetcher.server.delay - 1 > fetcher.throughput.threshold.pages - -1 > > crawl script settings > ---------------------------- > timeLimitFetch- 30 > numThreads - 5 > topN - 10000 > mapred.child.java.opts=-Xmx1000m > > > I have noticed today that the crawl has stopped due to an error and i have > found the below error in logs. > > 2013-03-01 21:45:03,767 INFO parse.ParseSegment - Parsed (0ms): >> http://scholar.lib.vt.edu/ejournals/JARS/v33n3/v33n3-letcher.htm >> 2013-03-01 21:45:03,790 WARN mapred.LocalJobRunner - job_local_0001 >> java.lang.OutOfMemoryError: unable to create new native thread >> at java.lang.Thread.start0(Native Method) >> at java.lang.Thread.start(Thread.java:658) >> at >> java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:681) >> at >> java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727) >> at >> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) >> at >> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92) >> at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:159) >> at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:93) >> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97) >> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) >> (END) > > > > Did anyone run in to the same issue ? I am not sure why the new native > thread is not being created. The link here says [0] that it might due to > the limitation of number of processes in my OS. Will increase them solve > the issue ? > > > [0] - http://ww2.cs.fsu.edu/~czhang/errors.html > > Thanks! > > -- > Kiran Chitturi > -- Kiran Chitturi

