Sugato, Some forms of Hadoop support access to the cluster via NFS [1]. That allows programs like trainlogistic to read data from a Hadoop cluster.
Sebastian's suggestion works fine for other Hadoop distributions if you don't mind the copy. I work for the company referenced so keep that in kind. [1] http://www.mapr.com On Wed, Jul 3, 2013 at 7:56 AM, Sugato Samanta <[email protected]> wrote: > Hi Sebastian, > > Thank you for your answer. > > Regards, > Sugato > > > On Wed, Jul 3, 2013 at 7:19 PM, Sebastian Schelter > <[email protected]>wrote: > > > Hi Sugato, > > > > I don't think that TrainLogistic is able to read data from HDFS. The > > simplest way is to copy the data to local disk via hadoop fs > > -copyToLocal and point TrainLogistic to that local copy. > > > > Best, > > Sebastian > > > > On 03.07.2013 05:39, Sugato Samanta wrote: > > > Hello, > > > > > > Can you please help? I am not able to read data from hadoop while using > > the > > > package *org.apache.mahout.classifier.sgd.TrainLogistic *but i am able > to > > > read from native file system. Is there a way to read data from hadoop > and > > > do logistic regression? > > > > > > Thanks, > > > Sugato > > > > > > > > > On Mon, Jul 1, 2013 at 11:28 AM, Sugato Samanta <[email protected] > > >wrote: > > > > > >> Hello Sebastian, > > >> > > >> Thank you for replying. I was able to do logistic regression.when i > keep > > >> the file on linux directory. However when i try to read the file from > > hdfs, > > >> it throws me an error. i looked into the package * > > >> org.apache.mahout.classifier.sgd.TrainLogistic* in detail and realized > > >> that it is only able to read input data from native file system. Is > > there a > > >> way out to read the data from hdfs? Please find the error message > which > > i > > >> was trying to send via attachment. > > >> > > >> [root@INFADDAD19 ~]# $MAHOUT_HOME > > >> org.apache.mahout.classifier.sgd.TrainLogistic --passes 10 --rate 5 > > >> --lambda 0.001 --input airline/2008.csv --features 21 --output > > >> ./airline_2008.model --target CRSDepTime --categories 2 --predictors > > >> ArrDelay DayOfWeek --types word numeric > > >> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. > > >> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and > > >> HADOOP_CONF_DIR=/etc/hadoop/conf > > >> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.2.0-job.jar > > >> 13/07/01 01:55:18 WARN driver.MahoutDriver: No > > >> org.apache.mahout.classifier.sgd.TrainLogistic.props found on > classpath, > > >> will use command-line arguments only > > >> Exception in thread "main" java.io.FileNotFoundException: > > airline/2008.csv > > >> (No such file or directory) > > >> at java.io.FileInputStream.open(Native Method) > > >> at java.io.FileInputStream.<init>(FileInputStream.java:120) > > >> at > > >> > > > org.apache.mahout.classifier.sgd.TrainLogistic.open(TrainLogistic.java:316) > > >> at > > >> > > > org.apache.mahout.classifier.sgd.TrainLogistic.mainToOutput(TrainLogistic.java:75) > > >> at > > >> > > > org.apache.mahout.classifier.sgd.TrainLogistic.main(TrainLogistic.java:64) > > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > >> at > > >> > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > >> at > > >> > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > >> at java.lang.reflect.Method.invoke(Method.java:597) > > >> at > > >> > > > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) > > >> at > > >> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144) > > >> at > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > >> at > > >> > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > >> at > > >> > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > >> at java.lang.reflect.Method.invoke(Method.java:597) > > >> at org.apache.hadoop.util.RunJar.main(RunJar.java:208) > > >> > > >> Thanks for your help. > > >> > > >> Regards, > > >> Sugato > > >> > > >> > > >> > > >> On Mon, Jul 1, 2013 at 9:33 AM, Sebastian Schelter < > > >> [email protected]> wrote: > > >> > > >>> Hello Sugato, > > >>> > > >>> attachments don't work on this list, unfortunately. Have you tried > > >>> copying the data from HDFS to the local filesystem of the machine > that > > >>> runs logistic regression? > > >>> > > >>> Best, > > >>> Sebastian > > >>> > > >>> On 01.07.2013 06:00, Sugato Samanta wrote: > > >>>> Hello, > > >>>> > > >>>> I am trying to do a logistic regression using Mahout. I am facing > > errors > > >>>> while reading data from Hadoop (HDFS). After checking the > > >>>> org.apache.mahout.classifier.sgd.TrainLogistic package i have learnt > > >>> that > > >>>> this is supposed to read input from Linux OS. Can you please help me > > to > > >>> get > > >>>> the data read from HDFS? The credentials used by me: > > >>>> > > >>>> OS: Linux Red Hat 5.0 (Cloudera) > > >>>> Mahout version: 0.7 > > >>>> Hadoop Version: 2.0.0-cdh4.2.0 > > >>>> > > >>>> $MAHOUT_HOME org.apache.mahout.classifier.sgd.TrainLogistic --passes > > 10 > > >>>> --rate 5 --lambda 0.001 --input /data01/final_data.csv --features 21 > > >>>> --output ./airline.model --target CRSDepTime --categories 2 > > --predictors > > >>>> ArrDelay DayOfWeek --types word numeric > > >>>> > > >>>> Error message has been attached. Thank you for your help. > > >>>> > > >>>> Regards, > > >>>> Sugato > > >>>> > > >>> > > >>> > > >> > > > > > > > >
