Hi Sugato, I don't think that TrainLogistic is able to read data from HDFS. The simplest way is to copy the data to local disk via hadoop fs -copyToLocal and point TrainLogistic to that local copy.
Best, Sebastian On 03.07.2013 05:39, Sugato Samanta wrote: > Hello, > > Can you please help? I am not able to read data from hadoop while using the > package *org.apache.mahout.classifier.sgd.TrainLogistic *but i am able to > read from native file system. Is there a way to read data from hadoop and > do logistic regression? > > Thanks, > Sugato > > > On Mon, Jul 1, 2013 at 11:28 AM, Sugato Samanta <[email protected]>wrote: > >> Hello Sebastian, >> >> Thank you for replying. I was able to do logistic regression.when i keep >> the file on linux directory. However when i try to read the file from hdfs, >> it throws me an error. i looked into the package * >> org.apache.mahout.classifier.sgd.TrainLogistic* in detail and realized >> that it is only able to read input data from native file system. Is there a >> way out to read the data from hdfs? Please find the error message which i >> was trying to send via attachment. >> >> [root@INFADDAD19 ~]# $MAHOUT_HOME >> org.apache.mahout.classifier.sgd.TrainLogistic --passes 10 --rate 5 >> --lambda 0.001 --input airline/2008.csv --features 21 --output >> ./airline_2008.model --target CRSDepTime --categories 2 --predictors >> ArrDelay DayOfWeek --types word numeric >> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. >> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and >> HADOOP_CONF_DIR=/etc/hadoop/conf >> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.2.0-job.jar >> 13/07/01 01:55:18 WARN driver.MahoutDriver: No >> org.apache.mahout.classifier.sgd.TrainLogistic.props found on classpath, >> will use command-line arguments only >> Exception in thread "main" java.io.FileNotFoundException: airline/2008.csv >> (No such file or directory) >> at java.io.FileInputStream.open(Native Method) >> at java.io.FileInputStream.<init>(FileInputStream.java:120) >> at >> org.apache.mahout.classifier.sgd.TrainLogistic.open(TrainLogistic.java:316) >> at >> org.apache.mahout.classifier.sgd.TrainLogistic.mainToOutput(TrainLogistic.java:75) >> at >> org.apache.mahout.classifier.sgd.TrainLogistic.main(TrainLogistic.java:64) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) >> at >> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144) >> at >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:208) >> >> Thanks for your help. >> >> Regards, >> Sugato >> >> >> >> On Mon, Jul 1, 2013 at 9:33 AM, Sebastian Schelter < >> [email protected]> wrote: >> >>> Hello Sugato, >>> >>> attachments don't work on this list, unfortunately. Have you tried >>> copying the data from HDFS to the local filesystem of the machine that >>> runs logistic regression? >>> >>> Best, >>> Sebastian >>> >>> On 01.07.2013 06:00, Sugato Samanta wrote: >>>> Hello, >>>> >>>> I am trying to do a logistic regression using Mahout. I am facing errors >>>> while reading data from Hadoop (HDFS). After checking the >>>> org.apache.mahout.classifier.sgd.TrainLogistic package i have learnt >>> that >>>> this is supposed to read input from Linux OS. Can you please help me to >>> get >>>> the data read from HDFS? The credentials used by me: >>>> >>>> OS: Linux Red Hat 5.0 (Cloudera) >>>> Mahout version: 0.7 >>>> Hadoop Version: 2.0.0-cdh4.2.0 >>>> >>>> $MAHOUT_HOME org.apache.mahout.classifier.sgd.TrainLogistic --passes 10 >>>> --rate 5 --lambda 0.001 --input /data01/final_data.csv --features 21 >>>> --output ./airline.model --target CRSDepTime --categories 2 --predictors >>>> ArrDelay DayOfWeek --types word numeric >>>> >>>> Error message has been attached. Thank you for your help. >>>> >>>> Regards, >>>> Sugato >>>> >>> >>> >> >
