Can you share the error?
On Dec 7, 2013 8:49 AM, "Haider" <[email protected]> wrote:

> Hi All
>
>     Thanks for you suggestions
> But in my case I have thousands small files and I want read them one by
> one.I think it is only possible by using listdir().
> As per Nitin comment I tried to install Pydoop but it is throwing me some
> strange error and I am not finding any inforamtion on pydoop on google.
>
> thanks
> Haider
>
>
>
>
> On Sat, Dec 7, 2013 at 8:19 AM, Yigitbasi, Nezih
> <[email protected]>wrote:
>
> > Haider,
> > You can use TextLoader to read a file in HDFS line by line, and then you
> > can pass those lines to your python UDF. Something like the following
> > should work:
> >
> > x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray);
> > y = foreach x generate my_udf(line);
> >
> > -----Original Message-----
> > From: Haider [mailto:[email protected]]
> > Sent: Thursday, December 5, 2013 10:12 PM
> > To: [email protected]
> > Subject: Re: listdir() python function is not wokring on hadoop
> >
> > I am trying to read from HDFS not from Local file system, so would it be
> > possible through listdir? or is there any way to read hdfs files one by
> one
> > and passing to one funtion.
> >
> >
> >
> >
> > On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih
> > <[email protected]>wrote:
> >
> > > I can call listdir to read from local filesystem in a python UDF. Did
> > > you implement your function as a proper UDF?
> > > ________________________________________
> > > From: Haider [[email protected]]
> > > Sent: Monday, December 02, 2013 5:22 AM
> > > To: [email protected]
> > > Subject: listdir() python function is not wokring on hadoop
> > >
> > > Hi all
> > >
> > >    is there any one who successfully used listdir() function to
> > > retrieve files one by one from HDFS using python script.
> > >
> > >
> > >  if __name__ == '__main__':
> > >
> > >     for filename in os.listdir("/user/hdmaster/XML2"):
> > >     print filename
> > >
> > > ERROR streaming.StreamJob: Job not successful. Error: # of failed Map
> > > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
> > > task_201312020139_0025_m_000000
> > > 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...
> > >
> > > My intention is to take files one by one to parse.
> > >
> > > Any help or suggestion on this will be so much helpful to me
> > >
> > > Thanks
> > > Haider
> > >
> >
>

Reply via email to