Hi,

> The 1000MB is static value, will the crawl bash script respect NUTCH_HEAPSIZE?

Yes, in local mode it will respect the value of the environment variable 
NUTCH_HEAPSIZE.
Respectively, the script $NUTCH_HOME/bin/nutch called by bin/crawl will respect 
it.

> How can I set NUTCH_HEAPSIZE?

It's an environment variable. How to set it might depend on the shell you're 
using.
E.g., for the bash shell:
  % export NUTCH_HEAPSIZE=2048
  % bin/crawl ...

Best,
Sebastian


On 12/7/18 4:05 PM, hany.n...@hsbc.com wrote:
> Thank you Sebastian.
> 
> I am using standalone Nutch and using crawl command. Didn't install separate 
> Hadoop cluster
> 
> The 1000MB is static value, will the crawl bash script respect NUTCH_HEAPSIZE?
> How can I set NUTCH_HEAPSIZE?
> 
> Kind regards, 
> Hany Shehata
> Enterprise Engineer
> Green Six Sigma Certified
> Solutions Architect, Marketing and Communications IT 
> Corporate Functions | HSBC Operations, Services and Technology (HOST)
> ul. Kapelanka 42A, 30-347 Kraków, Poland
> __________________________________________________________________ 
> 
> Tie line: 7148 7689 4698 
> External: +48 123 42 0698 
> Mobile: +48 723 680 278 
> E-mail: hany.n...@hsbc.com 
> __________________________________________________________________ 
> Protect our environment - please only print this if you have to!
> 
> -----Original Message-----
> From: Sebastian Nagel [mailto:wastl.na...@googlemail.com.INVALID] 
> Sent: 07 December 2018 15:44
> To: user@nutch.apache.org
> Subject: Re: mapred.child.java.opts
> 
> Hi,
> 
> yes, of course, the comments just one line above even encourages you to do so:
> 
> # note that some of the options listed here could be set in the # 
> corresponding hadoop site xml param file
> 
> For most use cases this value is ok. Only if you're using a parsing fetcher 
> with many threads you may need more Java heap memory. Note that this setting 
> only applies to a (pseudo-)distributed mode (running on Hadoop). In locale 
> mode you can set the Java heap size via the environment variable 
> NUTCH_HEAPSIZE.
> 
> 
>> What will be the impact?
> 
> That depends mostly on your Hadoop cluster setup. Afaik, the properties 
> mapreduce.map.java.opts resp. mapreduce.reduce.java.opts will override 
> mapred.child.java.opts on Hadoop 2.x, so on a recent configured Hadoop 
> cluster there is usually zero impact.
> 
> There is also a Jira issue open to make the heap memory configurable in 
> distributed mode, see
> https://issues.apache.org/jira/browse/NUTCH-2501
> 
> 
> Best,
> Sebastian
> 
> On 12/7/18 3:08 PM, hany.n...@hsbc.com wrote:
>> Hello,
>>
>> While checking the Nutch (1.15) crawl bash file, I noticed at line 211 
>> that 1000MB is statically set for java - > 
>> mapred.child.java.opts=-Xmx1000m
>>
>> Any idea why?, Can I change it?, What will be the impact?
>> Kind regards,
>> Hany Shehata
>> Enterprise Engineer
>> Green Six Sigma Certified
>> Solutions Architect, Marketing and Communications IT Corporate 
>> Functions | HSBC Operations, Services and Technology (HOST) ul. 
>> Kapelanka 42A, 30-347 Kraków, Poland 
>> __________________________________________________________________
>>
>> Tie line: 7148 7689 4698
>> External: +48 123 42 0698
>> Mobile: +48 723 680 278
>> E-mail: hany.n...@hsbc.com<mailto:hany.n...@hsbc.com>
>> __________________________________________________________________
>> Protect our environment - please only print this if you have to!
>>
>>

Reply via email to