I am using Hadoop 1.0.2 (the stock .deb, compiled by HortonWorks AFAIK). I noticed that my task tracker processes have multiple "-Xmx" configs attached, and that the later ones (128m) were overriding the ones I had intended to be used (500m).
After digging through the various scripts, I found that the problem is happening because "hadoop-env.sh" is getting invoked multiple times. The deb file created a link from "/etc/profile.d/" to hadoop-env.sh, so this file is run whenever I log in. The "hadoop" script also invokes hadoop-env.sh (via "hadoop-config.sh"). The following sequence is causing the problem: 1. The first time hadoop-env.sh is invoked (when the user logs in), HADOOP_CLIENT_OPTS is set to "-Xmx128m ...". 2. The second time hadoop-env.sh is invoked (when a Hadoop process is started), HADOOP_OPTS is set to "... $HADOOP_CLIENT_OPTS" (thereby including the memory setting for all Hadoop processes in general) 3. Also during the second execution, HADOOP_CLIENT_OPTS is recursively set to "-Xmx128m $HADOOP_CLIENT_OPTS" (so it now contains "-Xmx128m -Xmx128m"). 4. When the actual hadoop process is started, it always includes both JAVA_HEAP_SIZE and HADOOP_OPTS (in that order), but since HADOOP_OPTS also has a memory setting and is later in the command line, it takes precedence. I couldn't find any bug that matched this, so I thought I'd reach out to the community: Is this a known bug? Do the scripts and deb file belong to Hadoop in general, or is this the responsibility of a specific distribution? Thanks in advance! --Tom
