Here's what am trying to say: In most of the other projects, such as Hadoop, Pig, Sqoop, Flume, etc., the PROJECT_OPTS is used to specify "Additional JVM arguments" rather than application arguments. It has been the same in Mahout too, so MAHOUT_OPTS wasn't ever intended to be a way to pass application options/configs to the runtime, but rather to control heap space/system properties/etc..
The change you're proposing moves it AFTER the class invocation, which would break other uses relying on its right use today, so instead you could introduce a new env-var MAHOUT_APP_OPTS which goes after the classname and can accept all that -D generic conf params. On Sun, Sep 1, 2013 at 4:06 AM, Mario Rodriguez <[email protected]> wrote: > What I'm passing in MAHOUT_OPTS are parameters of the same nature of those > being set in bin/mahout: > > MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.dir=$MAHOUT_LOG_DIR" > MAHOUT_OPTS="$MAHOUT_OPTS -Dhadoop.log.file=$MAHOUT_LOGFILE" > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.min.split.size=512MB" > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.child.java.opts=-Xmx4096m" > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.child.java.opts=-Xmx4096m" > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.output.compress=true" > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.compress.map.output=true" > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.tasks=1" > MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.tasks=1" > MAHOUT_OPTS="$MAHOUT_OPTS -Dio.sort.factor=30" > MAHOUT_OPTS="$MAHOUT_OPTS -Dio.sort.mb=1024" > MAHOUT_OPTS="$MAHOUT_OPTS -Dio.file.buffer.size=32786" > > > I have a beefy dev box, and so can afford to tune those values. > > In the current exec call, those parameters are not considered in the tasks > being launched by org.apache.mahout.driver.MahoutDriver. > > I can look at this in more detail when Im back in the office on monday and > submit a JIRA ticket and patch (depending on how involved the right fix > turns out to be). > > Cheers, > > Mario > >> >> >> On Sat, Aug 31, 2013 at 2:34 PM, Harsh J <[email protected]> wrote: >> >>> I don't quite know what its used for, but that order change can be >>> considered incompatible, mainly cause in its current form it is (and >>> doubles up) applying directly to the JVM that launches Mahout, but the >>> changed form makes it into application-only arguments. >>> >>> On Sun, Sep 1, 2013 at 1:05 AM, Gokhan Capan <[email protected]> wrote: >>> > Hi Mario, >>> > >>> > Could you create a JIRA ticket for that, and submit your diff as a >>> patch if >>> > possible? >>> > http://issues.apache.org/jira/browse/MAHOUT >>> > >>> > Best, >>> > Gokhan >>> > >>> > >>> > On Sat, Aug 31, 2013 at 8:56 PM, Mario Rodriguez < >>> [email protected]>wrote: >>> > >>> >> Hi everyone, >>> >> >>> >> It seems MAHOUT_OPTS is not getting picked up when running mahout >>> locally >>> >> (MAHOUT_LOCAL=true). This can be fixed by switching the order in which >>> >> MAHOUT_OPTS is passed in bin/mahout from: >>> >> >>> >> exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS >>> >> "$@" >>> >> >>> >> to: >>> >> >>> >> exec "$JAVA" $JAVA_HEAP_MAX -classpath "$CLASSPATH" $CLASS "$@" >>> >> $MAHOUT_OPTS >>> >> >>> >> >>> >> I cant guarantee it wont break some other way of running it; it does >>> not >>> >> look like it will, but I have not tested it. >>> >> >>> >> Cheers, >>> >> >>> >> Mario >>> >> >>> >>> >>> >>> -- >>> Harsh J >>> >> >> -- Harsh J
