Thanks Sebastian for the suggestions. I came over this by using low value for topN(2000) than 10000. I decided to use lower value for topN with more rounds.
On Sun, Mar 3, 2013 at 3:41 PM, Sebastian Nagel <[email protected]>wrote: > Hi Kiran, > > there are many possible reasons for the problem. Beside the limits on the > number of processes > the stack size in the Java VM and the system (see java -Xss and ulimit -s). > > I think in local mode there should be only one mapper and consequently only > one thread spent for parsing. So the number of processes/threads is hardly > the > problem suggested that you don't run any other number crunching tasks in > parallel > on your desktop. > > Luckily, you should be able to retry via "bin/nutch parse ..." > Then trace the system and the Java process to catch the reason. > > Sebastian > > On 03/02/2013 08:13 PM, kiran chitturi wrote: > > Sorry, i am looking to crawl 400k documents with the crawl. I said 400 in > > my last message. > > > > > > On Sat, Mar 2, 2013 at 2:12 PM, kiran chitturi < > [email protected]>wrote: > > > >> Hi! > >> > >> I am running Nutch 1.6 on a 4 GB Mac OS desktop with Core i5 2.8GHz. > >> > >> Last night i started a crawl on local mode for 5 seeds with the config > >> given below. If the crawl goes well, it should fetch a total of 400 > >> documents. The crawling is done on a single host that we own. > >> > >> Config > >> --------------------- > >> > >> fetcher.threads.per.queue - 2 > >> fetcher.server.delay - 1 > >> fetcher.throughput.threshold.pages - -1 > >> > >> crawl script settings > >> ---------------------------- > >> timeLimitFetch- 30 > >> numThreads - 5 > >> topN - 10000 > >> mapred.child.java.opts=-Xmx1000m > >> > >> > >> I have noticed today that the crawl has stopped due to an error and i > have > >> found the below error in logs. > >> > >> 2013-03-01 21:45:03,767 INFO parse.ParseSegment - Parsed (0ms): > >>> http://scholar.lib.vt.edu/ejournals/JARS/v33n3/v33n3-letcher.htm > >>> 2013-03-01 21:45:03,790 WARN mapred.LocalJobRunner - job_local_0001 > >>> java.lang.OutOfMemoryError: unable to create new native thread > >>> at java.lang.Thread.start0(Native Method) > >>> at java.lang.Thread.start(Thread.java:658) > >>> at > >>> > java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:681) > >>> at > >>> > java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727) > >>> at > >>> > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) > >>> at > >>> > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92) > >>> at > org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:159) > >>> at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:93) > >>> at > org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97) > >>> at > org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44) > >>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > >>> at > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) > >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > >>> at > >>> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > >>> (END) > >> > >> > >> > >> Did anyone run in to the same issue ? I am not sure why the new native > >> thread is not being created. The link here says [0] that it might due to > >> the limitation of number of processes in my OS. Will increase them solve > >> the issue ? > >> > >> > >> [0] - http://ww2.cs.fsu.edu/~czhang/errors.html > >> > >> Thanks! > >> > >> -- > >> Kiran Chitturi > >> > > > > > > > > -- Kiran Chitturi

