Hello,
you can use the Pydoop HDFS API to work with HDFS files:
>>> import pydoop.hdfs as hdfs
>>> with hdfs.open('hdfs://localhost:8020/user/myuser/filename') as f:
... for line in f:
... do_something(line)
As you can see, the API is very similar to that of ordinary Python file
Oh, another link I should have included!
http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/
-andy
On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson wrote:
> Hadoop Streaming does not magically teach Python open() how to read
> from "hdfs://" URLs. You'll need to use a li
Hadoop Streaming does not magically teach Python open() how to read
from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
-cat" to read the file for you.
A few links that may help:
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
http://stackove
sorry
the error keep on, even when i modify the code
"offset,filename = line.strip().split("\t")"
At 2013-01-14 09:27:10,springring wrote:
>hi,
> I find the key point, not the hostname, it is right.
>just chang "offset,filename = line.split("\t")" to
>"offset,filename = line.strip().s
hi,
I find the key point, not the hostname, it is right.
just chang "offset,filename = line.split("\t")" to
"offset,filename = line.strip().split("\t")"
now it pass
At 2013-01-12 16:58:29,"Nitin Pawar" wrote:
>computedb-13 is not a valid host name
>
>may be if you have local hadoop the
computedb-13 is not a valid host name
may be if you have local hadoop then you can name refer it with
hdfs://localhost:9100/ or hdfs://127.0.0.1:9100
if its on other machine then just try with IP address of that machine
On Sat, Jan 12, 2013 at 12:55 AM, springring wrote:
> hi,
>
> I modif
hi,
I modify the file as below, there is still error
1 #!/bin/env python
2
3 import sys
4
5 for line in sys.stdin:
6 offset,filename = line.split("\t")
7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
8 print line
9 print filename
10 pri
is this correct path for writing onto hdfs?
"hdfs://user/hdfs/catalog3."
I don't see the namenode info in the path. Can this cause any issue. Just
making an guess
something like hdfs://host:port/path
On Sat, Jan 12, 2013 at 12:30 AM, springring wrote:
> hdfs://user/hdfs/catalog3/
--
Niti
Hi,
When I run code below as a streaming, the job error N/A and killed. I run
step by step, find it error when
" file_obj = open(file) " . When I run same code outside of hadoop, everything
is ok.
1 #!/bin/env python
2
3 import sys
4
5 for line in sys.stdin:
6 offset,file