How does the whole multiple segments work?

And the only stack trace I get is the OOM exception.  I haven't found
anything else indicating what's using up all of the memory.

If I use a shell script to execute the nutch commands instead of a java
program I don't get the OOM exception.  And they're both just infinite
loops that call the various nutch parts in order.

On Mon, Dec 19, 2011 at 10:08 AM, Markus Jelsma
<markus.jel...@openindex.io>wrote:

>
>
> On Monday 19 December 2011 15:57:02 Bai Shen wrote:
> > AFAIK, mapred.map.child.java.opts is not set, but I'll double check.
> >
> > When you say threads, you're referring to fetcher threads, correct?  I'm
> > using the default ten threads.  And the JVM reuse is set to -1, so it
> > shouldn't be reusing them.
>
> That sounds fine.
>
> > The problem only occurs after several hours of
> > crawling.
>
> ah, you might want to debug all your hadoop options now. It may fail during
> processing of your mapper output. This is very tedious to debug but you
> must
> follow the stack trace when it happens again. Most likely just a hadoop
> issue.
>
> Also, try to fetch less urls but more segments.
>
> >
> > On Fri, Dec 16, 2011 at 3:13 PM, Markus Jelsma
> >
> > <markus.jel...@openindex.io>wrote:
> > > Are you running with too many threads perhaps? It takes up additional
> > > RAM. Also, you must really verify that that is the actual heap space
> > > that is allocated. We usually use mapred.map.child.java.opts to set
> heap
> > > space for mappers specifically. The child opts is, in our case, used by
> > > the datanodes and jobtrackers.
> > >
> > > > The fetcher reduce jobs are what failed.  Two completed, but the
> third
> > > > died.  It tried to run on all three data nodes with the same results.
> > > >
> > > > MapReduce Child Java Maximum Heap Size is set to 1073741824
> > > >
> > > > The description states the following.
> > > >
> > > > The maximum heap size, in bytes, of the Java child process. This
> number
> > > > will be formatted and concatenated with the 'base' setting for
> > > > 'mapred_child_java_opts' to pass to Hadoop. Can be made final (see
> > > > below) to prevent clients from overriding it. Will be part of
> > > > generated client configuration.
> > > >
> > > > I thought there was another heap setting, but I'm not sure where to
> > > > find
> > >
> > > it
> > >
> > > > in Cloudera.
> > > >
> > > > On Fri, Dec 16, 2011 at 11:38 AM, Markus Jelsma
> > > >
> > > > <markus.jel...@openindex.io>wrote:
> > > > > What jobs exit with OOM? What is your heap size for the mapper and
> > > > > reducer?
> > > > >
> > > > > On Friday 16 December 2011 17:13:45 Bai Shen wrote:
> > > > > > I've tried running Nutch in local, psuedo, and full distributed
> > > > > > mode,
> > > > >
> > > > > and I
> > > > >
> > > > > > keep getting OutOfMemoryErrors.  I'm running Nutch using a
> slightly
> > > > > > modified version of the Crawler code that's included.  Basically,
> > >
> > > I've
> > >
> > > > > > modified it to continously crawl instead of stopping after a set
> > >
> > > number
> > >
> > > > > of
> > > > >
> > > > > > cycles.
> > > > > >
> > > > > > I have hadoop set not to reuse JVMs, so I'm not sure what the
> leak
> > >
> > > is.
> > >
> > > > >  Any
> > > > >
> > > > > > suggestions on what to look at?
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > > --
> > > > > Markus Jelsma - CTO - Openindex
>
> --
> Markus Jelsma - CTO - Openindex
>

Reply via email to