Can you share the error? On Dec 7, 2013 8:49 AM, "Haider" <[email protected]> wrote:
> Hi All > > Thanks for you suggestions > But in my case I have thousands small files and I want read them one by > one.I think it is only possible by using listdir(). > As per Nitin comment I tried to install Pydoop but it is throwing me some > strange error and I am not finding any inforamtion on pydoop on google. > > thanks > Haider > > > > > On Sat, Dec 7, 2013 at 8:19 AM, Yigitbasi, Nezih > <[email protected]>wrote: > > > Haider, > > You can use TextLoader to read a file in HDFS line by line, and then you > > can pass those lines to your python UDF. Something like the following > > should work: > > > > x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray); > > y = foreach x generate my_udf(line); > > > > -----Original Message----- > > From: Haider [mailto:[email protected]] > > Sent: Thursday, December 5, 2013 10:12 PM > > To: [email protected] > > Subject: Re: listdir() python function is not wokring on hadoop > > > > I am trying to read from HDFS not from Local file system, so would it be > > possible through listdir? or is there any way to read hdfs files one by > one > > and passing to one funtion. > > > > > > > > > > On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih > > <[email protected]>wrote: > > > > > I can call listdir to read from local filesystem in a python UDF. Did > > > you implement your function as a proper UDF? > > > ________________________________________ > > > From: Haider [[email protected]] > > > Sent: Monday, December 02, 2013 5:22 AM > > > To: [email protected] > > > Subject: listdir() python function is not wokring on hadoop > > > > > > Hi all > > > > > > is there any one who successfully used listdir() function to > > > retrieve files one by one from HDFS using python script. > > > > > > > > > if __name__ == '__main__': > > > > > > for filename in os.listdir("/user/hdmaster/XML2"): > > > print filename > > > > > > ERROR streaming.StreamJob: Job not successful. Error: # of failed Map > > > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: > > > task_201312020139_0025_m_000000 > > > 13/12/02 05:20:50 INFO streaming.StreamJob: killJob... > > > > > > My intention is to take files one by one to parse. > > > > > > Any help or suggestion on this will be so much helpful to me > > > > > > Thanks > > > Haider > > > > > >
