Hi Sugato,

I don't think that TrainLogistic is able to read data from HDFS. The
simplest way is to copy the data to local disk via hadoop fs
-copyToLocal and point TrainLogistic to that local copy.

Best,
Sebastian

On 03.07.2013 05:39, Sugato Samanta wrote:
> Hello,
> 
> Can you please help? I am not able to read data from hadoop while using the
> package *org.apache.mahout.classifier.sgd.TrainLogistic *but i am able to
> read from native file system. Is there a way to read data from hadoop and
> do logistic regression?
> 
> Thanks,
> Sugato
> 
> 
> On Mon, Jul 1, 2013 at 11:28 AM, Sugato Samanta <[email protected]>wrote:
> 
>> Hello Sebastian,
>>
>> Thank you for replying. I was able to do logistic regression.when i keep
>> the file on linux directory. However when i try to read the file from hdfs,
>> it throws me an error. i looked into the package *
>> org.apache.mahout.classifier.sgd.TrainLogistic* in detail and realized
>> that it is only able to read input data from native file system. Is there a
>> way out to read the data from hdfs? Please find the error message which i
>> was trying to send via attachment.
>>
>> [root@INFADDAD19 ~]# $MAHOUT_HOME
>> org.apache.mahout.classifier.sgd.TrainLogistic --passes 10 --rate 5
>> --lambda 0.001 --input airline/2008.csv --features 21 --output
>> ./airline_2008.model --target CRSDepTime --categories 2 --predictors
>> ArrDelay DayOfWeek --types word numeric
>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
>> HADOOP_CONF_DIR=/etc/hadoop/conf
>> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.2.0-job.jar
>> 13/07/01 01:55:18 WARN driver.MahoutDriver: No
>> org.apache.mahout.classifier.sgd.TrainLogistic.props found on classpath,
>> will use command-line arguments only
>> Exception in thread "main" java.io.FileNotFoundException: airline/2008.csv
>> (No such file or directory)
>>         at java.io.FileInputStream.open(Native Method)
>>         at java.io.FileInputStream.<init>(FileInputStream.java:120)
>>         at
>> org.apache.mahout.classifier.sgd.TrainLogistic.open(TrainLogistic.java:316)
>>         at
>> org.apache.mahout.classifier.sgd.TrainLogistic.mainToOutput(TrainLogistic.java:75)
>>         at
>> org.apache.mahout.classifier.sgd.TrainLogistic.main(TrainLogistic.java:64)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>>         at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>>         at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>>
>> Thanks for your help.
>>
>> Regards,
>> Sugato
>>
>>
>>
>> On Mon, Jul 1, 2013 at 9:33 AM, Sebastian Schelter <
>> [email protected]> wrote:
>>
>>> Hello Sugato,
>>>
>>> attachments don't work on this list, unfortunately. Have you tried
>>> copying the data from HDFS to the local filesystem of the machine that
>>> runs logistic regression?
>>>
>>> Best,
>>> Sebastian
>>>
>>> On 01.07.2013 06:00, Sugato Samanta wrote:
>>>> Hello,
>>>>
>>>> I am trying to do a logistic regression using Mahout. I am facing errors
>>>> while reading data from Hadoop (HDFS). After checking the
>>>> org.apache.mahout.classifier.sgd.TrainLogistic package i have learnt
>>> that
>>>> this is supposed to read input from Linux OS. Can you please help me to
>>> get
>>>> the data read from HDFS? The credentials used by me:
>>>>
>>>> OS: Linux Red Hat 5.0 (Cloudera)
>>>> Mahout version: 0.7
>>>> Hadoop Version: 2.0.0-cdh4.2.0
>>>>
>>>> $MAHOUT_HOME org.apache.mahout.classifier.sgd.TrainLogistic --passes 10
>>>> --rate 5 --lambda 0.001 --input /data01/final_data.csv --features 21
>>>> --output ./airline.model --target CRSDepTime --categories 2 --predictors
>>>> ArrDelay DayOfWeek --types word numeric
>>>>
>>>> Error message has been attached. Thank you for your help.
>>>>
>>>> Regards,
>>>> Sugato
>>>>
>>>
>>>
>>
> 

Reply via email to