>> I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK). ... >> Do the scripts and deb file belong to Hadoop in general, or is this the responsibility of a specific distribution?
Hi Tom, Good description. I searched in Jira for "HADOOP_CLIENT_OPTS", and it appears there are at least two bugs open on this issue (although in the later context of 2.0.2 and 3.0): HADOOP-9211<https://issues.apache.org/jira/browse/HADOOP-9211> and HADOOP-9351 <https://issues.apache.org/jira/browse/HADOOP-9351>. I encourage you to follow and/or contribute to those jiras if you are interested in improving the usability issue. Regarding whether you are looking at Apache stuff or something specific to a distro: I'm going to get a little pedantic here, sorry, there's no other way to explain it and the differences are actually important from a legalistic standpoint. As members of this community, we wear multiple "hats". I'm a committer and PMC member for the Apache Hadoop project, and wearing that "hat" I was also the release manager for Hadoop-1.0.2. I think you found those deb packages in the Apache artifact repositories. If so, it was compiled by me, as Release Manager for that release of Hadoop-1. But it wasn't compiled by Hortonworks -- even though I am also an employee of Hortonworks and Hortonworks supports my work on behalf of the community. Hortonworks makes releases of HDP, their supported product which includes or is "powered by" Apache Hadoop and related projects. Other companies also publish distributions powered by Hadoop. But those distros are available from their respective companies' web sites. Anything you download from Apache is provided by members of the Apache Hadoop community on a non-commercial basis. All of our companies are proud to support this work, as it is part of the opensource "virtuous circle" between the community and the companies, the technology and the commerce. Hope that helps. Feel free to contact me off-list if you want to discuss more. Regards, --Matt On Tue, Mar 12, 2013 at 12:50 PM, Tom Brown <[email protected]> wrote: > I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK). > > I noticed that my task tracker processes have multiple "-Xmx" configs > attached, and that the later ones (128m) were overriding the ones I > had intended to be used (500m). > > After digging through the various scripts, I found that the problem is > happening because "hadoop-env.sh" is getting invoked multiple times. > The deb file created a link from "/etc/profile.d/" to hadoop-env.sh, > so this file is run whenever I log in. The "hadoop" script also > invokes hadoop-env.sh (via "hadoop-config.sh"). The following sequence > is causing the problem: > > 1. The first time hadoop-env.sh is invoked (when the user logs in), > HADOOP_CLIENT_OPTS is set to "-Xmx128m ...". > > 2. The second time hadoop-env.sh is invoked (when a Hadoop process is > started), HADOOP_OPTS is set to "... $HADOOP_CLIENT_OPTS" (thereby > including the memory setting for all Hadoop processes in general) > > 3. Also during the second execution, HADOOP_CLIENT_OPTS is recursively > set to "-Xmx128m $HADOOP_CLIENT_OPTS" (so it now contains "-Xmx128m > -Xmx128m"). > > 4. When the actual hadoop process is started, it always includes both > JAVA_HEAP_SIZE and HADOOP_OPTS (in that order), but since HADOOP_OPTS > also has a memory setting and is later in the command line, it takes > precedence. > > I couldn't find any bug that matched this, so I thought I'd reach out > to the community: Is this a known bug? Do the scripts and deb file > belong to Hadoop in general, or is this the responsibility of a > specific distribution? > > Thanks in advance! > > --Tom >
