Re: Streaming job hanging

2012-02-24 Thread Sameer Farooqui
Hi Mohit,

Can you provide some more info about the job you're trying to run? What
version of Hadoop are you using? What language is the Hadoop streaming job
written in? Have you been able to run any Hadoop streaming
jobs successfully in this cluster? I'm wondering if all Hadoop streaming
jobs fail, or just this one is failing.

Instead of running this on a file with possibly 551 blocks, can you try to
run it on a small file with like 1 or 2 blocks and see if it runs
successfully?

When I ran a Hadoop streaming job with Python, on a few small files (1-2
MB), the job ran pretty quickly in 77 seconds (for the Map+Reduce phases):


packageJobJar: [/home/hduser/mapper.py, /home/hduser/reducer.py,
/mnt/hadoop/tmp/hadoop-unjar5368493284653516019/] []
/tmp/streamjob8122180536767888261.jar tmpDir=null
11/09/06 23:38:04 INFO mapred.FileInputFormat: Total input paths to process
: 3
11/09/06 23:38:05 INFO streaming.StreamJob: getLocalDirs():
[/mnt/hadoop/tmp/mapred/local]
11/09/06 23:38:05 INFO streaming.StreamJob: Running job:
job_201109062238_0001
11/09/06 23:38:05 INFO streaming.StreamJob: To kill this job, run:
11/09/06 23:38:05 INFO streaming.StreamJob:
/usr/local/hadoop/bin/../bin/hadoop job
 -Dmapred.job.tracker=localhost:54311 -kill job_201109062238_0001
11/09/06 23:38:05 INFO streaming.StreamJob: Tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201109062238_0001
11/09/06 23:38:06 INFO streaming.StreamJob:  map 0%  reduce 0%
11/09/06 23:38:26 INFO streaming.StreamJob:  map 32%  reduce 0%
11/09/06 23:38:29 INFO streaming.StreamJob:  map 39%  reduce 0%
11/09/06 23:38:32 INFO streaming.StreamJob:  map 48%  reduce 0%
11/09/06 23:38:35 INFO streaming.StreamJob:  map 50%  reduce 0%
11/09/06 23:38:50 INFO streaming.StreamJob:  map 75%  reduce 0%
11/09/06 23:38:53 INFO streaming.StreamJob:  map 100%  reduce 0%
11/09/06 23:38:56 INFO streaming.StreamJob:  map 100%  reduce 17%
11/09/06 23:39:08 INFO streaming.StreamJob:  map 100%  reduce 67%
11/09/06 23:39:12 INFO streaming.StreamJob:  map 100%  reduce 76%
11/09/06 23:39:14 INFO streaming.StreamJob:  map 100%  reduce 86%
11/09/06 23:39:17 INFO streaming.StreamJob:  map 100%  reduce 96%
11/09/06 23:39:23 INFO streaming.StreamJob:  map 100%  reduce 100%
11/09/06 23:39:29 INFO streaming.StreamJob: Job complete:
job_201109062238_0001
11/09/06 23:39:29 INFO streaming.StreamJob: Output:
/hduser/wordcount_python-output


--
Sameer Farooqui
Systems Architect / Hortonworks




On Wed, Feb 22, 2012 at 8:38 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Streaming job just seems to be hanging

 12/02/22 17:35:50 INFO streaming.StreamJob: map 0% reduce 0%

 -

 On the admin page I see that it created 551 input split. Could somone
 suggest a way to find out what might be causing it to hang? I increased
 io.sort.mb to 200 MB.

 I am using 5 data nodes with 12 CPU, 96G RAM.



Streaming job hanging

2012-02-22 Thread Mohit Anchlia
Streaming job just seems to be hanging

12/02/22 17:35:50 INFO streaming.StreamJob: map 0% reduce 0%

-

On the admin page I see that it created 551 input split. Could somone
suggest a way to find out what might be causing it to hang? I increased
io.sort.mb to 200 MB.

I am using 5 data nodes with 12 CPU, 96G RAM.