[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-26 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838868#action_12838868
 ] 

Drew Farris commented on MAHOUT-301:


bq. Can you upload the patch for the maven configs. Maybe a separate issue? and 
mark it as 0.3. 

See: MAHOUT-311

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.3
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838834#action_12838834
 ] 

Grant Ingersoll commented on MAHOUT-301:


Just capturing something longer term here, no need to block anything.  One of 
the things I'd love to have is some basic "experiment management" capabilities. 
 I can imagine in this mode that things like input parameters, etc. are all 
written into files and organized along with the output, etc. such that it is 
easy to keep track of all the different ways things get run over time.   Seems 
like this script w/ default property files, etc. could be part of that solution.

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.3
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-25 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838725#action_12838725
 ] 

Jake Mannix commented on MAHOUT-301:


Drew, do you have a patch with your last changes?  If I can try them out too to 
verify that they work on more than one system, we can commit this I think.

{quote}
Should I commit those, open another issue or should I re-post as a part of this 
patch?
{quote}

I'd say that should be in a separate issue, that should be small enough to mark 
for 0.3 and commit separately.

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.3
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-25 Thread Robin Anil (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838700#action_12838700
 ] 

Robin Anil commented on MAHOUT-301:
---

+1 for committing this. 

Can you upload the patch for the maven configs. Maybe a separate issue? and 
mark it as 0.3. 

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.3
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-25 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838694#action_12838694
 ] 

Drew Farris commented on MAHOUT-301:


Had a chance to take this out for a spin tonight. It is working very well. I 
did some k-means using the script starting with the 20newsgroups collection as 
textfiles, both locally and on a cluster. I think it is good to go, can we 
commit? I'd be happy to handle it if we have sufficient consensus.

There are a couple modifications I've made to the maven assemblies to include 
all of this in the binary and source releases properly (adding the conf 
directory, setting executable on the mahout script, etc). While I was at it, I 
cleaned up the bin assembly process so that the releases should build faster 
too. Should I  commit those, open another issue or should I re-post as a part 
of this patch?


> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.3
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-24 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838159#action_12838159
 ] 

Jake Mannix commented on MAHOUT-301:


Ok, new patch, with the modification that indeed you have the ability to just 
run "$MAHOUT_HOME/bin/mahout  [args]" and it still works.  And if 
.props exists on the classpath, it'll get used for defaults. w00t, 
as the kids say.

I've added to the patch the conf directory (you'd not kept it in your patch, 
Drew), and there are a bunch of emtpy files in there, except some of them have 
commented out properties in the right format:

cleaneigen.props :

{code}
#ci|corpusInput =
#ei|eigenInput =
#o|output =
{code}

To help users see what they can store in here, and in what format.

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-24 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837917#action_12837917
 ] 

Jake Mannix commented on MAHOUT-301:


Awesome Drew, I'll check it out.

{quote}
One potential TODO from this would be to potentially launch arbitrary classes 
if no matching program name is specified, but I need to dig into ProgramDriver 
to understand how it works before I can contribute something like that.
{quote}

Yeah, I was thinking about that over breakfast - an easy hack to do this is 
while the driver.classes.props file is being read, keep track if whether you've 
found an exact match on args[0], and once all of drivers.classes.props has been 
read and you haven't found a match, just do a Class.forName(args[0]) and add it 
to the ProgramDriver with it's full name as the "shortName" and the rest of the 
program will work (and would even still work with default properties files!  If 
you put com.mycompany.MyClass.props in $MAHOUT_CONF_DIR, it'll read that for 
defaults).

I'll see if I can add that to your patch later today.  I think if that's 
working, we should be looking good to commit and see who else wants to play 
with it and test it out.

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-24 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837763#action_12837763
 ] 

Drew Farris commented on MAHOUT-301:


This sounds great. I will take it for a spin when I am in front of a computer. 
My take is that the old if, else it's in the script are now redundant. As long 
as one can use MahoutDriver to run both classes that have been aliased to short 
names and classes specified using the full name, I say let's get rid of them.



> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837616#action_12837616
 ] 

Jake Mannix commented on MAHOUT-301:


Our comments crossed in the ether! :)

{quote}
Any thoughts on whether it makes sense to attempt to work the latter form into 
the mahout script? It won't pull the necessary config files for MahoutDriver in 
from a path outside of the job file unless HADOOP_CLASSPATH is set to include 
those directories, but I haven't had a chance to verify that.
{quote}

You're right - I did indeed set my HADOOP_CLASSPATH to include 
$MAHOUT_CONF_DIR, which allowed this to work, otherwise it would not.  This 
should be done by the script.

Ideally, yes, it's ugly but if $MAHOUT_HOME/bin/mahout just sets 
$HADOOP_CLASSPATH to include $MAHOUT_CONF_DIR (or $MAHOUT_HOME/conf if that 
variable is not set), then just execute $HADOOP_HOME/bin/hadoop jar ... then it 
should work.

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837607#action_12837607
 ] 

Drew Farris commented on MAHOUT-301:


It doesn't appear that the following command works as intended:

{code}
./bin/mahout org.apache.hadoop.util.RunJar 
/path/to/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.driver.MahoutDriver 
TestClassifier
{code}

The following seems to be the appropriate way to achieve what we're trying to 
do here: 

{code}
hadoop jar examples/target/mahout-examples-0.3-SNAPSHOT.job 
org.apache.mahout.driver.MahoutDriver TestClassifier
{code}

Any thoughts on whether it makes sense to attempt to work the latter form into 
the mahout script? It won't pull the necessary config files for MahoutDriver in 
from a path outside of the job file unless HADOOP_CLASSPATH is set to include 
those directories, but I haven't had a chance to verify that.

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837477#action_12837477
 ] 

Drew Farris commented on MAHOUT-301:


bq. Cool, so why not just check to see if $HADOOP_CONF_DIR is set - if it is, 
do "runjob" as described, if it's not, do "run" to do locally.

Yes, ok -- that should work because I believe you can use RunJar to launch 
anything even if it isn't a mapreduce job, no need for classpath setup in this 
case either -- all you need to do is point to the examples job. Might be able 
to take advantage of this elsewhere.

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837472#action_12837472
 ] 

Jake Mannix commented on MAHOUT-301:


{quote}
Ahh, I see where you're coming from, so without core, you're suggesting that 
mahout pick up the jar files in the target directories if they exist? I think 
it is fine to modify the non-core classpath to include these, they won't be 
present in the release build anyway.
{quote}

Cool, yeah, that makes sense.
{quote}
Are any of the default properties files used beyond the MahoutDriver, which 
executes locally and sets up the job? Do these files need to be distributed to 
the rest of the cluster? As noted above, I think the proper way to run 
MahoutDriver in the context of a distributed job is to do something like:
{code}
./bin/mahout org.apache.hadoop.util.RunJar 
/path/to/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.driver.MahoutDriver 
TestClassifier
{code}
I suspect we could easilly modify the mahout script and shorten this to:
{code}
./bin/mahout runjob TestClassifier
{code}
{quote}

Cool, so why not just check to see if $HADOOP_CONF_DIR is set - if it is, do 
"runjob" as described, if it's not, do "run" to do locally.

{quote}
FWIW, 
[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html|GenericOptionsParser]
 provides a way to do this with -files, -libjars and -archives
{quote}

Now of course, I guess I don't really need the files to get onto the job's 
classpath *on the cluster* - it just needs to be on the classpath of the 
locally running jvm which is invoking MahoutDriver.main().  So I was doing more 
work than was necessary.  This is easy to do, just add MAHOUT_CONF_DIR to the 
classpath and we're good to go.



> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837448#action_12837448
 ] 

Drew Farris commented on MAHOUT-301:


{quote}
Hmm... ok. I'm a little reticent about running -core when testing, because I'm 
not really testing what the release run will be like - I like the idea of 
having a single set of dependencies (jars, not classes directories) which are 
used locally, and the .job when hitting a remote hadoop cluster. Maybe I'm just 
not familiar with the -core option and it's use.
{quote}

Ahh, I see where you're coming from, so without core, you're suggesting that 
mahout pick up the jar files in the target directories if they exist? I think 
it is fine to modify the non-core classpath to include these, they won't be 
present in the release build anyway.

{quote}
The last step, as you've noted, is because I'm not sure that the script 
actually properly lets HADOOP_CONF_DIR properly get passed through the mahout 
shell script to actually running on the hadoop cluster, but maybe that's just a 
config issue in my case? Also means that in fact the default properties idea 
still doesn't work on hadoop, unless the default properties files are pushed to 
the classpath.
{quote}

Are any of the default properties files used beyond the MahoutDriver, which 
executes locally and sets up the job? Do these files need to be distributed to 
the rest of the cluster? As noted above, I think the proper way to run 
MahoutDriver in the context of a distributed job is to do something like:

{code}
./bin/mahout org.apache.hadoop.util.RunJar 
/path/to/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.driver.MahoutDriver 
TestClassifier
{code}

I suspect we could easilly modify the mahout script and shorten this to:

{code}
./bin/mahout runjob TestClassifier
{code}

I can look at this a little closer tonight, so if you have an updated patch for 
me to work on/test in a few hours, definitely post it. I'd be happy to make any 
changes you're interested in.

{quote}
What is the right way run a job with some additional (runtime) files added to 
the job's classpath? Is there some cmdline arg to "hadoop" that I'm forgetting?
{quote}

FWIW, 
[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html|GenericOptionsParser]
 provides a way to do this with -files, -libjars and -archives


> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> comm

[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837440#action_12837440
 ] 

Jake Mannix commented on MAHOUT-301:


{quote}
Jake, the basic idea is that you would always use -core when executing from 
within a build, but you would not use core when executing in the context of a 
binary release.
{quote}

Hmm... ok.  I'm a little reticent about running -core when testing, because I'm 
not really testing what the release run will be like - I like the idea of 
having a single set of dependencies (jars, not classes directories) which are 
used locally, and the .job when hitting a remote hadoop cluster.  Maybe I'm 
just not familiar with the -core option and it's use.  

So far, I've always run by the process of 

 * make code/config changes
 * run mvn clean install (sometimes with -DskipTests if I'm doing rapid 
iterations)
 * run "mahout  args" OR
 * hadoop jar examples/target/mahout-examples-{version}.job  args

The last step, as you've noted, is because I'm not sure that the script 
actually properly lets HADOOP_CONF_DIR properly get passed through the mahout 
shell script to actually running on the hadoop cluster, but maybe that's just a 
config issue in my case?  Also means that in fact the default properties idea 
still doesn't work on hadoop, unless the default properties files are pushed to 
the classpath.

Maybe a kludgey way to do it would be for the script to grab the properties 
files from the MAHOUT_CONF_DIR, unzip the release job jar, push them into it, 
and re-jar it back up and then give it to hadoop, and now those files will be 
available on the classpath of the running job on the remote cluster? 

What is the right way run a job with some additional (runtime) files added to 
the job's classpath?  Is there some cmdline arg to "hadoop" that I'm forgetting?

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837434#action_12837434
 ] 

Drew Farris commented on MAHOUT-301:


Jake, the basic idea is that you would always use -core when executing from 
within a build, but you would not use core when executing in the context of a 
binary release.

The binary release, built using mvn -Prelease, lands in 
target/mahout-0.3-SNAPSHOT.tar.gz, untar that and try running bin/mahout from 
the directory that's created and that should work fine without -core

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837428#action_12837428
 ] 

Jake Mannix commented on MAHOUT-301:


{quote}
Something else I noticed is that the 'mahout' script doesn't add the classes in 
$MAHOUT_HOME/lib/*.jar to the classpath. This breakes the binary release in 
that it can't run anything, e.g:
{quote}

{quote}
Also wondering what the purpose of adding the job jars to the classpath is? 
(removed in patch)
{quote}

When I run locally now, not using -core, I get this failure:
{code}
/bin/mahout vectordump -s wiki-sparse-vectors-out/vectors/part-0
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/mahout/utils/vectors/VectorDumper
{code}

This appears to be because your patch has CLASSPATH set to add on things like 
$MAHOUT_HOME/mahout-*.jar, which doesn't exist after I've done "mvn install".  
Is there another maven target I need to use to generate the release jars in 
$MAHOUT_HOME?

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837376#action_12837376
 ] 

Drew Farris commented on MAHOUT-301:


{quote}
This wasn't a problem with my patch, right?  That was an issue of the mahout 
script in trunk itself?
{quote}

Yes it was a problem with the script in trunk. I believe this was due to the 
fact that the job files were on the classpath instead of all of the dependency 
jars. Adding the job files to the classpath does not add the dependency jars 
they contain to the classpath as well. So, no you didn't add this, but it 
should be fixed (and is in the patch)

{quote}
What is the -core option for?  I've never used it, how does it work?
{quote}

when you're running bin/mahout in the context of a build the -core option is 
used to tell it to use the build classpath instead of the classpath used for a 
binary release. This just follows the pattern established (by Doug?) in the 
hadoop and nutch launch scripts.

{quote}
Also added a help message for the 'run' argument.
{quote}

near line 72 in bin/mahout:
(this is different from the --help question I had)

{code}
  echo "  seq2sparsegenerate sparse vectors from a sequence file"
  echo "  vectordumpdump vectors from a sequence file"
  echo "  run   run mahout tasks using the MahoutDriver, see: 
http://cwiki.apache.org/MAHOUT/mahoutdriver.html";
{code}

{quote}
So you already added the ability to load via classpath, right? If we merge that 
way of thinking with what I'm currently working on (having a configurable 
"MAHOUT_CONF_DIR" which is used for all these props files), we could just have 
the mahout shell script just add MAHOUT_CONF_DIR to the classpath (the way you 
already have it adding the hardwired core/src/main/resources directory) and 
then it would work that way.
{quote}

Yep, that should do it, as long as MAHOUT_CONF_DIR appears before 
src/main/resources, we should be good to go. It should be added outside of the 
section of the script that determines if -core has been specified on the 
command-line.



> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instea

[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837351#action_12837351
 ] 

Jake Mannix commented on MAHOUT-301:


Ok, Drew, got your patch in diff mode against mine finally.  

So you already added the ability to load via classpath, right?  If we merge 
that way of thinking with what I'm currently working on (having a configurable 
"MAHOUT_CONF_DIR" which is used for all these props files), we could just have 
the mahout shell script just add MAHOUT_CONF_DIR to the classpath (the way you 
already have it adding the hardwired core/src/main/resources directory) and 
then it would work that way.

New patch merging yours with mine forthcoming.

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837345#action_12837345
 ] 

Jake Mannix commented on MAHOUT-301:


Hey Drew, thanks for looking at this.  Problems you saw are probably what are 
known as "bugs". :)
{quote}
Did some testing, here's a patch to clean some of these things up + a couple 
questions:
Could we load the default driver.classes.props from the classpath? If it was 
loaded that way the default would work regardless of where the mahout script is 
run from (it currently only works if ./bin/mahout is run, not ./mahout for 
example) and regardless of whether we're running from a binary release or the 
dev environment. (included in patch)
{quote}

YES!  We should indeed load from classpath.  My most recent version of this 
patch (which isn't posted, because it conflicts with yours, I'm trying to 
resolve that now) changes it so that you just supply a single directory in 
which driver.classes.props and the shortNames.props files are located.

{quote}
Something else I noticed is that the 'mahout' script doesn't add the classes in 
$MAHOUT_HOME/lib/*.jar to the classpath. This breakes the binary release in 
that it can't run anything, e.g:

./mahout vectordump
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/commons/cli2/OptionException
Caused by: java.lang.ClassNotFoundException: 
org.apache.commons.cli2.OptionException
(fixed in patch)
{code}

This wasn't a problem with my patch, right?  That was an issue of the mahout 
script in trunk itself?  

{code}
Using -core in the context of a dev build should work properly, but leaving out 
-core will cause the script to error unless run in the context of a release - 
this is the way it should work, right?
{code}

What is the -core option for?  I've never used it, how does it work?

{code}
Also added a help message for the 'run' argument.
{code}

Where did you add that?

{code}
Does executing './mahout run --help' hang for anyone else or is it something 
specific to my environment? (didn't track this one down)
{code}

The --help option I didn't have in there, you added it, do you know where it's 
hanging?

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repet

[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837243#action_12837243
 ] 

Drew Farris commented on MAHOUT-301:


bq. BTW. How is hadoop execution done using shell script ? i.e

It looks like something like the following would do the trick

{code}
/bin/mahout -core org.apache.hadoop.util.RunJar 
/path/to/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.driver.MahoutDriver 
TestClassifier
{code}

we could probably provide 'runjob' case that appends 
'org.apache.hadoop.util.RunJar examples/target/mahout-examples-0.3-SNAPSHOT.job 
org.apache.mahout.driver.MahoutDriver', but perhaps this could be used in every 
case that 'run' is called?



> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-23 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837234#action_12837234
 ] 

Drew Farris commented on MAHOUT-301:


bq. including the job jar is much cleaner than adding all deps. Plus there is 
nothing more to configure to execute it on top of hadoop.. 

The job files work fine with 'hadoop jar', but putting the job files in the 
classspath will not automatically include the dependencies they contain (e.g 
commons-cli2) on the classpath: the dependencies need to be added separately 
(see the ClassNotFoundException case described above)

bq. BTW. How is hadoop execution done using shell script ?

If the HADOOP_CONF_DIR is set, it should be picked up by the jobs, but I don't 
think that means jar/jobfile execution works properly. I suspect this needs 
modifications to make that possible.


> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-22 Thread Robin Anil (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837120#action_12837120
 ] 

Robin Anil commented on MAHOUT-301:
---

including the job jar is much cleaner than adding all deps. Plus there is 
nothing more to configure to execute it on top of hadoop..

BTW. How is hadoop execution done using shell script ? i.e

hadoop jar mahout-examples-0.3.job o.a.m...DictionaryVectorizer --input . 
args

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-22 Thread Robin Anil (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836962#action_12836962
 ] 

Robin Anil commented on MAHOUT-301:
---

The help comments are missing from the mahout/bin script. Scroll up that file 
and you will see a pretty printed help string. Just add the Mahout driver 
description and possibly a wikilink there. Otherwise looks good to commit.  I 
have checked the full functionality yet. If anyone else want to take a look, 
please do quickly

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-22 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836952#action_12836952
 ] 

Jake Mannix commented on MAHOUT-301:


Oh, I forgot to finish my sentence which began "run as follows..."

Once youv'e got default property files in your $MAHOUT_CONF_DIR, you can run 
like so:

{code}
$MAHOUT_HOME/bin/mahout run wikToSeq
{code}

and that's it.  If you want to override the options in your wikToSeq.props 
file, just pass them in on that same command line above, and they override as 
desired.

If this can be tested out and debugged, this patch is ready for committing, and 
significantly improves the command line experience.

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836328#action_12836328
 ] 

Jake Mannix commented on MAHOUT-301:


This patch modifies the mahout shell script to add the "run" command, which 
invokes this driver class. 

It also more nicely takes shortName definitions from either 
core/src/main/resources/driver.classes.props or the "-cf configFile" location, 
and runs the class specified by shortName using props specified in 
core/src/main/resources/shortName.props or whatever is "-df defaultpropsFile".

Also takes options in the file of the form "DsomeOpt = optionVal" and passes 
those into the program as "-DsomeOpt=optionVal" as well.

Not sure how well it works on hadoop yet.  But comand line seems to work for 
the one class I've got a props file for (TestClassifier). 

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Robin Anil (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836278#action_12836278
 ] 

Robin Anil commented on MAHOUT-301:
---

Looks great. We parallely need to convert all mainClasses  extending 
AbstractJob and cleanup the stuff there at MAHOUT-294



> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836271#action_12836271
 ] 

Jake Mannix commented on MAHOUT-301:


So this current patch will totally take -conf / -Dprop=value type stuff, and 
pass it directly on into the program in the usual way, with the only difference 
being that these arguments could also be in a properties file, as long as their 
using the exact same form, which would make ugly props files as is:

if you wanted to not have to type:

  $MAHOUT_HOME/bin/mahout myClassShortName -DmyProp=value

You would could currently need to have, in your props file:

DmyProp = value

which looks kinda silly, but would work.  Oh wait, no it wouldn't, it would end 
up with a command line which would do " -DmyProp value" not "-DmyProp=value".  
To get the latter, we'd need an even uglier thing with the current patch:

"DmyProp=value"=

which would get interpolated into -DmyProp=value on the internal command line.  
Super ugly.

I've got a modified version of this I can upload in a bit which takes care of 
the short-name/long-name arguments thing by a bit of a kludge, with props files 
which would look like this:

i | input = foo/path

which is to be interpreted as: if on the command line, the user say "-i 
bar/path" OR "--input baz/path", they override the "foo/path" in the props 
file.  If the line in the props file has no "|" separating two options, it's 
assumed to be prepended with "-".  

Still doesn't remove the ugliness of -Dprop=value though. Not sure how is best 
to handle that one.  What kind of props file syntax would tell it "take these 
key-value pairs and do '-key value" and do these other ones as '-Dkey=value'"?  
I guess just having the 'D' there would be a good signal?   It could then just 
take

i | input = foo/path
DmyProp = propValue

and translate that into a command line like: progName -i foo/path 
-DmyProp=myValue 

That would work and be not completely horribly ugly.  Not great though.

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836268#action_12836268
 ] 

Drew Farris commented on MAHOUT-301:


{blockquote}
What does GenericOptionsParser do if you have a command line input like this:

programName --input foo.txt -i bar.txt

where --input is the long argument name for -i as short name? Which one wins? 
Is it deterministic? 
{blockquote}

In most cases it's really depends on the implementation, sometimes 
GenericOptiosnParser isn't even being used. In Mahout's case it's likely to be 
commons-cli2  that's actually doing the parsing, and I don't know how it would 
behave in this case. I'll take a look.

GenericOptionsParser simply  handles things like -conf and -Dprop=value that 
control hadoop configurations, job settings and the like, and then hands back 
the rest to the caller.  In many cases in the mahout , GenericOptionsParser 
isn't used at all which reduces the control one has over a job's behavior.  
iirc, Sean and Robin have made some progress towards eliminating these cases 
with the AbstractJob class. 

> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836247#action_12836247
 ] 

Ted Dunning commented on MAHOUT-301:



THis also helps non command line usage, actually.  I can imagine a workflow 
solution where setting all parameters on every step get onerous.


> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836231#action_12836231
 ] 

Jake Mannix commented on MAHOUT-301:


The TODO refers to the issue that I think there, but am not sure: what does 
GenericOptionsParser do if you have a command line input like this:

programName --input foo.txt -i bar.txt 

where --input is the long argument name for -i as short name?  Which one wins?  
Is it deterministic?  


> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Drew Farris (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836209#action_12836209
 ] 

Drew Farris commented on MAHOUT-301:


This is pretty nice, it gets to the point where relying on shell-history or 
ad-hoc mechanisms to manage command-lines kills me and this is a nice solution.

I've quickly skimmed the patch but I haven't tried it out. I see the TODO in 
there regarding short vs. long arguments. Do you have any thoughts on how to 
support single-dask arguments? Things the arguments supported by the 
[GenericOptionsParser|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html]
 could be set in the properties file too.



> Improve command-line shell script by allowing default properties files
> --
>
> Key: MAHOUT-301
> URL: https://issues.apache.org/jira/browse/MAHOUT-301
> Project: Mahout
>  Issue Type: New Feature
>  Components: Utils
>Affects Versions: 0.3
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Minor
> Fix For: 0.4
>
> Attachments: MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.