i was stuck with similar issue before and couldn't come up with a more
viable alternative than this so if the output of the hadoop command is not
that big then you can take it into your py script and process it .
i use the following code snippet to clean the output of ls and store it
into a py list for process.
In your case you can do a len on the list to get file count
fscommand = "hadoop dfs -ls /path/in/%s/*/ 2> /dev/null"%("hdfs")
hadoop_cmd=commands.getoutput(fscommand)
lines = hadoop_cmd.split("\n")[1:]
strlines =[map(lambda a:a.strip(),line.split(' ')[-3:]) for line in lines]
On Sun, Feb 17, 2013 at 4:17 AM, jamal sasha <[email protected]> wrote:
> Hi,
>
> This might be more of a python centric question but was wondering if
> anyone has tried it out...
>
> I am trying to run few hadoop commands from python program...
>
> For example if from command line, you do:
>
> bin/hadoop dfs -ls /hdfs/query/path
>
> it returns all the files in the hdfs query path..
> So very similar to unix
>
>
> Now I am trying to basically do this from python.. and do some
> manipulation from it.
>
> exec_str = "path/to/hadoop/bin/hadoop dfs -ls " + query_path
> os.system(exec_str)
>
> Now, I am trying to grab this output to do some manipulation in it.
> For example.. count number of files?
> I looked into subprocess module but then... these are not native shell
> commands. hence not sure whether i can apply those concepts
> How to solve this?
>
> Thanks
>
>
>
--
regards ,
Anuj Maurice