Hi, > The 1000MB is static value, will the crawl bash script respect NUTCH_HEAPSIZE?
Yes, in local mode it will respect the value of the environment variable NUTCH_HEAPSIZE. Respectively, the script $NUTCH_HOME/bin/nutch called by bin/crawl will respect it. > How can I set NUTCH_HEAPSIZE? It's an environment variable. How to set it might depend on the shell you're using. E.g., for the bash shell: % export NUTCH_HEAPSIZE=2048 % bin/crawl ... Best, Sebastian On 12/7/18 4:05 PM, hany.n...@hsbc.com wrote: > Thank you Sebastian. > > I am using standalone Nutch and using crawl command. Didn't install separate > Hadoop cluster > > The 1000MB is static value, will the crawl bash script respect NUTCH_HEAPSIZE? > How can I set NUTCH_HEAPSIZE? > > Kind regards, > Hany Shehata > Enterprise Engineer > Green Six Sigma Certified > Solutions Architect, Marketing and Communications IT > Corporate Functions | HSBC Operations, Services and Technology (HOST) > ul. Kapelanka 42A, 30-347 Kraków, Poland > __________________________________________________________________ > > Tie line: 7148 7689 4698 > External: +48 123 42 0698 > Mobile: +48 723 680 278 > E-mail: hany.n...@hsbc.com > __________________________________________________________________ > Protect our environment - please only print this if you have to! > > -----Original Message----- > From: Sebastian Nagel [mailto:wastl.na...@googlemail.com.INVALID] > Sent: 07 December 2018 15:44 > To: user@nutch.apache.org > Subject: Re: mapred.child.java.opts > > Hi, > > yes, of course, the comments just one line above even encourages you to do so: > > # note that some of the options listed here could be set in the # > corresponding hadoop site xml param file > > For most use cases this value is ok. Only if you're using a parsing fetcher > with many threads you may need more Java heap memory. Note that this setting > only applies to a (pseudo-)distributed mode (running on Hadoop). In locale > mode you can set the Java heap size via the environment variable > NUTCH_HEAPSIZE. > > >> What will be the impact? > > That depends mostly on your Hadoop cluster setup. Afaik, the properties > mapreduce.map.java.opts resp. mapreduce.reduce.java.opts will override > mapred.child.java.opts on Hadoop 2.x, so on a recent configured Hadoop > cluster there is usually zero impact. > > There is also a Jira issue open to make the heap memory configurable in > distributed mode, see > https://issues.apache.org/jira/browse/NUTCH-2501 > > > Best, > Sebastian > > On 12/7/18 3:08 PM, hany.n...@hsbc.com wrote: >> Hello, >> >> While checking the Nutch (1.15) crawl bash file, I noticed at line 211 >> that 1000MB is statically set for java - > >> mapred.child.java.opts=-Xmx1000m >> >> Any idea why?, Can I change it?, What will be the impact? >> Kind regards, >> Hany Shehata >> Enterprise Engineer >> Green Six Sigma Certified >> Solutions Architect, Marketing and Communications IT Corporate >> Functions | HSBC Operations, Services and Technology (HOST) ul. >> Kapelanka 42A, 30-347 Kraków, Poland >> __________________________________________________________________ >> >> Tie line: 7148 7689 4698 >> External: +48 123 42 0698 >> Mobile: +48 723 680 278 >> E-mail: hany.n...@hsbc.com<mailto:hany.n...@hsbc.com> >> __________________________________________________________________ >> Protect our environment - please only print this if you have to! >> >>