python setup.py build-->giving error Packaging Java classes sh: 1: jar: not found error: Error packaging java component. Command: jar -cf build/lib.linux-i686-2.7/pydoop/pydoop_1_1_2.jar -C build/temp.linux-i686-2.7/pipes-1.1.2 ./it
On Sat, Dec 7, 2013 at 12:00 PM, Nitin Pawar <[email protected]>wrote: > Can you share the error? > On Dec 7, 2013 8:49 AM, "Haider" <[email protected]> wrote: > > > Hi All > > > > Thanks for you suggestions > > But in my case I have thousands small files and I want read them one by > > one.I think it is only possible by using listdir(). > > As per Nitin comment I tried to install Pydoop but it is throwing me some > > strange error and I am not finding any inforamtion on pydoop on google. > > > > thanks > > Haider > > > > > > > > > > On Sat, Dec 7, 2013 at 8:19 AM, Yigitbasi, Nezih > > <[email protected]>wrote: > > > > > Haider, > > > You can use TextLoader to read a file in HDFS line by line, and then > you > > > can pass those lines to your python UDF. Something like the following > > > should work: > > > > > > x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray); > > > y = foreach x generate my_udf(line); > > > > > > -----Original Message----- > > > From: Haider [mailto:[email protected]] > > > Sent: Thursday, December 5, 2013 10:12 PM > > > To: [email protected] > > > Subject: Re: listdir() python function is not wokring on hadoop > > > > > > I am trying to read from HDFS not from Local file system, so would it > be > > > possible through listdir? or is there any way to read hdfs files one by > > one > > > and passing to one funtion. > > > > > > > > > > > > > > > On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih > > > <[email protected]>wrote: > > > > > > > I can call listdir to read from local filesystem in a python UDF. Did > > > > you implement your function as a proper UDF? > > > > ________________________________________ > > > > From: Haider [[email protected]] > > > > Sent: Monday, December 02, 2013 5:22 AM > > > > To: [email protected] > > > > Subject: listdir() python function is not wokring on hadoop > > > > > > > > Hi all > > > > > > > > is there any one who successfully used listdir() function to > > > > retrieve files one by one from HDFS using python script. > > > > > > > > > > > > if __name__ == '__main__': > > > > > > > > for filename in os.listdir("/user/hdmaster/XML2"): > > > > print filename > > > > > > > > ERROR streaming.StreamJob: Job not successful. Error: # of failed Map > > > > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: > > > > task_201312020139_0025_m_000000 > > > > 13/12/02 05:20:50 INFO streaming.StreamJob: killJob... > > > > > > > > My intention is to take files one by one to parse. > > > > > > > > Any help or suggestion on this will be so much helpful to me > > > > > > > > Thanks > > > > Haider > > > > > > > > > >
