Instead of 'scraping' this way, consider using a library such as Pydoop (http://pydoop.sourceforge.net) which provides pythonic ways and APIs to interact with Hadoop components. There are also other libraries covered at http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/ for example.
On Sun, Feb 17, 2013 at 4:17 AM, jamal sasha <[email protected]> wrote: > Hi, > > This might be more of a python centric question but was wondering if > anyone has tried it out... > > I am trying to run few hadoop commands from python program... > > For example if from command line, you do: > > bin/hadoop dfs -ls /hdfs/query/path > > it returns all the files in the hdfs query path.. > So very similar to unix > > > Now I am trying to basically do this from python.. and do some manipulation > from it. > > exec_str = "path/to/hadoop/bin/hadoop dfs -ls " + query_path > os.system(exec_str) > > Now, I am trying to grab this output to do some manipulation in it. > For example.. count number of files? > I looked into subprocess module but then... these are not native shell > commands. hence not sure whether i can apply those concepts > How to solve this? > > Thanks > > -- Harsh J
