Hi,

yes, of course, the comments just one line above even encourages you to do so:

# note that some of the options listed here could be set in the
# corresponding hadoop site xml param file

For most use cases this value is ok. Only if you're using a parsing fetcher 
with many threads you
may need more Java heap memory. Note that this setting only applies to
a (pseudo-)distributed mode (running on Hadoop). In locale mode you can set the 
Java heap size via
the environment variable NUTCH_HEAPSIZE.


> What will be the impact?

That depends mostly on your Hadoop cluster setup. Afaik, the properties 
mapreduce.map.java.opts
resp. mapreduce.reduce.java.opts will override mapred.child.java.opts on Hadoop 
2.x, so on a recent
configured Hadoop cluster
there is usually zero impact.

There is also a Jira issue open to make the heap memory configurable in 
distributed mode, see
https://issues.apache.org/jira/browse/NUTCH-2501


Best,
Sebastian

On 12/7/18 3:08 PM, hany.n...@hsbc.com wrote:
> Hello,
> 
> While checking the Nutch (1.15) crawl bash file, I noticed at line 211 that 
> 1000MB is statically set for java - > mapred.child.java.opts=-Xmx1000m
> 
> Any idea why?, Can I change it?, What will be the impact?
> Kind regards,
> Hany Shehata
> Enterprise Engineer
> Green Six Sigma Certified
> Solutions Architect, Marketing and Communications IT
> Corporate Functions | HSBC Operations, Services and Technology (HOST)
> ul. Kapelanka 42A, 30-347 Kraków, Poland
> __________________________________________________________________
> 
> Tie line: 7148 7689 4698
> External: +48 123 42 0698
> Mobile: +48 723 680 278
> E-mail: hany.n...@hsbc.com<mailto:hany.n...@hsbc.com>
> __________________________________________________________________
> Protect our environment - please only print this if you have to!
> 
> 
> 
> -----------------------------------------
> SAVE PAPER - THINK BEFORE YOU PRINT!
> 
> This E-mail is confidential.  
> 
> It may also be legally privileged. If you are not the addressee you may not 
> copy,
> forward, disclose or use any part of it. If you have received this message in 
> error,
> please delete it and all copies from your system and notify the sender 
> immediately by
> return E-mail.
> 
> Internet communications cannot be guaranteed to be timely secure, error or 
> virus-free.
> The sender does not accept liability for any errors or omissions.
> 

Reply via email to