Just briefly ...
It looks like org/apache/mahout/classifier/sgd/CsvRecordFactory.java is
throwing a null exception when there is no target column at line 197
196: // record target column and establish dictionary for decoding target
197: target = vars.get(targetName);
Letting vars.get(targetName) return a null without throwing an exception
would appear to let this run and classify new data.
On 26/05/2014 22:07, Duncan Lawie wrote:
Hi ,
I'm trying to get a grip on the mahout command line options, and
getting caught either in gross misunderstanding or Java errors. Help
greatly appreciated.
I've created some hand-built data which I expect to be noisy, but
still hoped to run through my workflow before improving my data quality.
"id","brace","target"
000040045,0194,1
000006445,0149,1
000033554,0013,1
...
My understanding is that my workflow should be as follows
1: Use "trainAdaptiveLogistic" with scored data to create a model
(here called PC.model)
2: Use "validateAdaptiveLogistic " to test how good the model is on a
holdout data set which has been scored
3: Use "runAdaptiveLogistic" on some unscored data (ie no third
column) to find out new things
Firstly ... Is that a valid workflow?
runAdaptiveLogistic appears to expect scored data as well - at least,
it fails if I give it only unscored data (ie the "target" column is
absent)
If not, how do I productionise a model?
(Note: I got the flow to work (at least with scored data for all
three) with mahout-0.7 and mahout-0.8 but as I thought the "run" step
should work differently I tried mahout-0.9. Here, the second step
also fails.
[cloudera@localhost ]$ mahout trainAdaptiveLogistic \
--passes 100 \
--input ./PCtrain \
--features 50 \
--output ./PC.model \
--target target \
--categories 2 \
--predictors brace \
--types t
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB:
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar
14/05/26 13:56:19 WARN driver.MahoutDriver: No
trainAdaptiveLogistic.props found on classpath, will use command-line
arguments only
50
target ~
0.000000000 0.051644057 0.000000000 0.000000000
0.000000000 0.023763329 0.000000000 0.000000000
-0.054034312 -0.000000000 0.000000000 0.021475032
0.028820276 0.000000000 0.033145160 0.000000000
0.000000000 0.000000000 0.000000000 -0.000000000
0.000000000 0.000000000 0.000000000 0.000000000
0.000000000 0.000000000 0.051755156 0.000000000
-0.000000000 -0.000000001 0.000000000 -0.053815953
0.030166157 0.000000000 0.000000000 -0.073127179
0.000000000 -0.000000000 0.000000000 0.000000000
-0.000000000 0.000000000 0.000000000 -0.108047988
0.000000000 0.000000000 0.000000000 0.000000000
0.000000000 -0.000000000
14/05/26 13:56:36 INFO driver.MahoutDriver: Program took 17784 ms
(Minutes: 0.2964)
[cloudera@localhost]$ mahout validateAdaptiveLogistic \
--input ./PCtest \
--model ./PC.model \
--auc \
--confusion
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB:
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar
14/05/26 13:56:53 WARN driver.MahoutDriver: No
validateAdaptiveLogistic.props found on classpath, will use
command-line arguments only
Log-likelihood:Min=-0.78, Max=-0.61, Mean=-0.68, Median=-0.69
AUC = 0.65
=======================================================
Confusion Matrix
-------------------------------------------------------
a b <--Classified as
182 0 | 182 a = 1
0 18 | 18 b = 2
Entropy Matrix: [[-0.7, -0.4], [-0.7, -0.3]]
14/05/26 13:56:54 INFO driver.MahoutDriver: Program took 1125 ms
(Minutes: 0.018766666666666668)
[cloudera@localhost]$ mahout runAdaptiveLogistic \
--input ./PCrun \
--model ./PC.model \
--idcolumn id \
--output ./PC.out
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB:
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar
14/05/26 13:57:09 WARN driver.MahoutDriver: No
runAdaptiveLogistic.props found on classpath, will use command-line
arguments only
Exception in thread "main" java.lang.NullPointerException
at
org.apache.mahout.classifier.sgd.CsvRecordFactory.firstLine(CsvRecordFactory.java:176)
at
org.apache.mahout.classifier.sgd.RunAdaptiveLogistic.mainToOutput(RunAdaptiveLogistic.java:83)
at
org.apache.mahout.classifier.sgd.RunAdaptiveLogistic.main(RunAdaptiveLogistic.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
[cloudera@localhost]$ mahout runAdaptiveLogistic \
--input ./PCtest \
--model ./PC.model \
--idcolumn id \
--output ./PC.out
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB:
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/mahout/mahout-examples-0.8-cdh5.0.0-job.jar
14/05/26 13:57:35 WARN driver.MahoutDriver: No
runAdaptiveLogistic.props found on classpath, will use command-line
arguments only
100 records processed
200 records processed
200 records processed totally.
14/05/26 13:57:36 INFO driver.MahoutDriver: Program took 943 ms
(Minutes: 0.015716666666666667)
Thanks,
Duncan.
!DSPAM:5383ad33115841664913184!