Sugato,

Some forms of Hadoop support access to the cluster via NFS [1].  That
allows programs like trainlogistic to read data from a Hadoop cluster.

Sebastian's suggestion works fine for other Hadoop distributions if you
don't mind the copy.

I work for the company referenced so keep that in kind.

[1] http://www.mapr.com


On Wed, Jul 3, 2013 at 7:56 AM, Sugato Samanta <[email protected]> wrote:

> Hi Sebastian,
>
> Thank you for your answer.
>
> Regards,
> Sugato
>
>
> On Wed, Jul 3, 2013 at 7:19 PM, Sebastian Schelter
> <[email protected]>wrote:
>
> > Hi Sugato,
> >
> > I don't think that TrainLogistic is able to read data from HDFS. The
> > simplest way is to copy the data to local disk via hadoop fs
> > -copyToLocal and point TrainLogistic to that local copy.
> >
> > Best,
> > Sebastian
> >
> > On 03.07.2013 05:39, Sugato Samanta wrote:
> > > Hello,
> > >
> > > Can you please help? I am not able to read data from hadoop while using
> > the
> > > package *org.apache.mahout.classifier.sgd.TrainLogistic *but i am able
> to
> > > read from native file system. Is there a way to read data from hadoop
> and
> > > do logistic regression?
> > >
> > > Thanks,
> > > Sugato
> > >
> > >
> > > On Mon, Jul 1, 2013 at 11:28 AM, Sugato Samanta <[email protected]
> > >wrote:
> > >
> > >> Hello Sebastian,
> > >>
> > >> Thank you for replying. I was able to do logistic regression.when i
> keep
> > >> the file on linux directory. However when i try to read the file from
> > hdfs,
> > >> it throws me an error. i looked into the package *
> > >> org.apache.mahout.classifier.sgd.TrainLogistic* in detail and realized
> > >> that it is only able to read input data from native file system. Is
> > there a
> > >> way out to read the data from hdfs? Please find the error message
> which
> > i
> > >> was trying to send via attachment.
> > >>
> > >> [root@INFADDAD19 ~]# $MAHOUT_HOME
> > >> org.apache.mahout.classifier.sgd.TrainLogistic --passes 10 --rate 5
> > >> --lambda 0.001 --input airline/2008.csv --features 21 --output
> > >> ./airline_2008.model --target CRSDepTime --categories 2 --predictors
> > >> ArrDelay DayOfWeek --types word numeric
> > >> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> > >> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
> > >> HADOOP_CONF_DIR=/etc/hadoop/conf
> > >> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.2.0-job.jar
> > >> 13/07/01 01:55:18 WARN driver.MahoutDriver: No
> > >> org.apache.mahout.classifier.sgd.TrainLogistic.props found on
> classpath,
> > >> will use command-line arguments only
> > >> Exception in thread "main" java.io.FileNotFoundException:
> > airline/2008.csv
> > >> (No such file or directory)
> > >>         at java.io.FileInputStream.open(Native Method)
> > >>         at java.io.FileInputStream.<init>(FileInputStream.java:120)
> > >>         at
> > >>
> >
> org.apache.mahout.classifier.sgd.TrainLogistic.open(TrainLogistic.java:316)
> > >>         at
> > >>
> >
> org.apache.mahout.classifier.sgd.TrainLogistic.mainToOutput(TrainLogistic.java:75)
> > >>         at
> > >>
> >
> org.apache.mahout.classifier.sgd.TrainLogistic.main(TrainLogistic.java:64)
> > >>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >>         at
> > >>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >>         at
> > >>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >>         at java.lang.reflect.Method.invoke(Method.java:597)
> > >>         at
> > >>
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> > >>         at
> > >> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> > >>         at
> > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > >>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >>         at
> > >>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >>         at
> > >>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >>         at java.lang.reflect.Method.invoke(Method.java:597)
> > >>         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> > >>
> > >> Thanks for your help.
> > >>
> > >> Regards,
> > >> Sugato
> > >>
> > >>
> > >>
> > >> On Mon, Jul 1, 2013 at 9:33 AM, Sebastian Schelter <
> > >> [email protected]> wrote:
> > >>
> > >>> Hello Sugato,
> > >>>
> > >>> attachments don't work on this list, unfortunately. Have you tried
> > >>> copying the data from HDFS to the local filesystem of the machine
> that
> > >>> runs logistic regression?
> > >>>
> > >>> Best,
> > >>> Sebastian
> > >>>
> > >>> On 01.07.2013 06:00, Sugato Samanta wrote:
> > >>>> Hello,
> > >>>>
> > >>>> I am trying to do a logistic regression using Mahout. I am facing
> > errors
> > >>>> while reading data from Hadoop (HDFS). After checking the
> > >>>> org.apache.mahout.classifier.sgd.TrainLogistic package i have learnt
> > >>> that
> > >>>> this is supposed to read input from Linux OS. Can you please help me
> > to
> > >>> get
> > >>>> the data read from HDFS? The credentials used by me:
> > >>>>
> > >>>> OS: Linux Red Hat 5.0 (Cloudera)
> > >>>> Mahout version: 0.7
> > >>>> Hadoop Version: 2.0.0-cdh4.2.0
> > >>>>
> > >>>> $MAHOUT_HOME org.apache.mahout.classifier.sgd.TrainLogistic --passes
> > 10
> > >>>> --rate 5 --lambda 0.001 --input /data01/final_data.csv --features 21
> > >>>> --output ./airline.model --target CRSDepTime --categories 2
> > --predictors
> > >>>> ArrDelay DayOfWeek --types word numeric
> > >>>>
> > >>>> Error message has been attached. Thank you for your help.
> > >>>>
> > >>>> Regards,
> > >>>> Sugato
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
> >
>

Reply via email to