Re: python streaming error

2013-01-15 Thread Simone Leo
Hello, you can use the Pydoop HDFS API to work with HDFS files: >>> import pydoop.hdfs as hdfs >>> with hdfs.open('hdfs://localhost:8020/user/myuser/filename') as f: ... for line in f: ... do_something(line) As you can see, the API is very similar to that of ordinary Python file

Re: python streaming error

2013-01-14 Thread Andy Isaacson
Oh, another link I should have included! http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/ -andy On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson wrote: > Hadoop Streaming does not magically teach Python open() how to read > from "hdfs://" URLs. You'll need to use a li

Re: python streaming error

2013-01-14 Thread Andy Isaacson
Hadoop Streaming does not magically teach Python open() how to read from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs -cat" to read the file for you. A few links that may help: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ http://stackove

Re:Re:Re: Re: python streaming error

2013-01-13 Thread springring
sorry the error keep on, even when i modify the code "offset,filename = line.strip().split("\t")" At 2013-01-14 09:27:10,springring wrote: >hi, > I find the key point, not the hostname, it is right. >just chang "offset,filename = line.split("\t")" to >"offset,filename = line.strip().s

Re:Re: Re: python streaming error

2013-01-13 Thread springring
hi, I find the key point, not the hostname, it is right. just chang "offset,filename = line.split("\t")" to "offset,filename = line.strip().split("\t")" now it pass At 2013-01-12 16:58:29,"Nitin Pawar" wrote: >computedb-13 is not a valid host name > >may be if you have local hadoop the

Re: Re: python streaming error

2013-01-12 Thread Nitin Pawar
computedb-13 is not a valid host name may be if you have local hadoop then you can name refer it with hdfs://localhost:9100/ or hdfs://127.0.0.1:9100 if its on other machine then just try with IP address of that machine On Sat, Jan 12, 2013 at 12:55 AM, springring wrote: > hi, > > I modif

Re:Re: python streaming error

2013-01-12 Thread springring
hi, I modify the file as below, there is still error 1 #!/bin/env python 2 3 import sys 4 5 for line in sys.stdin: 6 offset,filename = line.split("\t") 7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename 8 print line 9 print filename 10 pri

Re: python streaming error

2013-01-12 Thread Nitin Pawar
is this correct path for writing onto hdfs? "hdfs://user/hdfs/catalog3." I don't see the namenode info in the path. Can this cause any issue. Just making an guess something like hdfs://host:port/path On Sat, Jan 12, 2013 at 12:30 AM, springring wrote: > hdfs://user/hdfs/catalog3/ -- Niti

python streaming error

2013-01-12 Thread springring
Hi, When I run code below as a streaming, the job error N/A and killed. I run step by step, find it error when " file_obj = open(file) " . When I run same code outside of hadoop, everything is ok. 1 #!/bin/env python 2 3 import sys 4 5 for line in sys.stdin: 6 offset,file