Hi ,

I'm trying to get a grip on the mahout command line options, and getting caught either in gross misunderstanding or Java errors. Help greatly appreciated.

I've created some hand-built data which I expect to be noisy, but still hoped to run through my workflow before improving my data quality.

"id","brace","target"
000040045,0194,1
000006445,0149,1
000033554,0013,1
...

My understanding is that my workflow should be as follows
1: Use "trainAdaptiveLogistic" with scored data to create a model (here called PC.model) 2: Use "validateAdaptiveLogistic " to test how good the model is on a holdout data set which has been scored 3: Use "runAdaptiveLogistic" on some unscored data (ie no third column) to find out new things

Firstly ... Is that a valid workflow?

runAdaptiveLogistic appears to expect scored data as well - at least, it fails if I give it only unscored data (ie the "target" column is absent)

If not, how do I productionise a model?

(Note: I got the flow to work (at least with scored data for all three) with mahout-0.7 and mahout-0.8 but as I thought the "run" step should work differently I tried mahout-0.9. Here, the second step also fails.


[cloudera@localhost ]$ mahout trainAdaptiveLogistic \
--passes 100 \
--input ./PCtrain \
--features 50 \
--output ./PC.model \
--target target \
--categories 2 \
--predictors brace \
--types t

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar 14/05/26 13:56:19 WARN driver.MahoutDriver: No trainAdaptiveLogistic.props found on classpath, will use command-line arguments only
50
target ~

0.000000000 0.051644057 0.000000000 0.000000000 0.000000000 0.023763329 0.000000000 0.000000000 -0.054034312 -0.000000000 0.000000000 0.021475032 0.028820276 0.000000000 0.033145160 0.000000000 0.000000000 0.000000000 0.000000000 -0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.051755156 0.000000000 -0.000000000 -0.000000001 0.000000000 -0.053815953 0.030166157 0.000000000 0.000000000 -0.073127179 0.000000000 -0.000000000 0.000000000 0.000000000 -0.000000000 0.000000000 0.000000000 -0.108047988 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 -0.000000000 14/05/26 13:56:36 INFO driver.MahoutDriver: Program took 17784 ms (Minutes: 0.2964)

[cloudera@localhost]$ mahout validateAdaptiveLogistic \
--input ./PCtest \
--model ./PC.model \
--auc \
--confusion
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar 14/05/26 13:56:53 WARN driver.MahoutDriver: No validateAdaptiveLogistic.props found on classpath, will use command-line arguments only

Log-likelihood:Min=-0.78, Max=-0.61, Mean=-0.68, Median=-0.69

AUC = 0.65

=======================================================
Confusion Matrix
-------------------------------------------------------
a        b        <--Classified as
182      0         |  182       a     = 1
0        18        |  18        b     = 2



Entropy Matrix: [[-0.7, -0.4], [-0.7, -0.3]]
14/05/26 13:56:54 INFO driver.MahoutDriver: Program took 1125 ms (Minutes: 0.018766666666666668)

[cloudera@localhost]$ mahout runAdaptiveLogistic \
--input ./PCrun \
--model ./PC.model \
--idcolumn id \
--output ./PC.out
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar 14/05/26 13:57:09 WARN driver.MahoutDriver: No runAdaptiveLogistic.props found on classpath, will use command-line arguments only
Exception in thread "main" java.lang.NullPointerException
at org.apache.mahout.classifier.sgd.CsvRecordFactory.firstLine(CsvRecordFactory.java:176) at org.apache.mahout.classifier.sgd.RunAdaptiveLogistic.mainToOutput(RunAdaptiveLogistic.java:83) at org.apache.mahout.classifier.sgd.RunAdaptiveLogistic.main(RunAdaptiveLogistic.java:54)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

[cloudera@localhost]$ mahout runAdaptiveLogistic \
--input ./PCtest \
--model ./PC.model \
--idcolumn id \
--output ./PC.out
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar 14/05/26 13:57:35 WARN driver.MahoutDriver: No runAdaptiveLogistic.props found on classpath, will use command-line arguments only
100 records processed
200 records processed
200 records processed totally.
14/05/26 13:57:36 INFO driver.MahoutDriver: Program took 943 ms (Minutes: 0.015716666666666667)

Thanks,
Duncan.

Reply via email to