Re: join operation fails on big data set

Johnny Zhang Fri, 12 Apr 2013 14:02:25 -0700

seems a HDFS issue, as you said, cannot retrieval certain block from
certain DN. Can you check the health of all DN? And properly also bump the
log4j level to DEBUG.


Johnny


On Fri, Apr 12, 2013 at 12:06 PM, Mua Ban <[email protected]> wrote:

> Thank you very much Cheolsoo,
>
> I am running the script once more right now and I see 7 failed reducers at
> the moment on the job tracker GUI. I browse these failed reducers and I
> found the task logs. From these 7 failed reducers, some have type 1 task
> log, the rest have type 2 task log as I show below.
>
> They seem related to some connection issue among nodes in the cluster. Do
> you know any parameters I should configure to figure out the actual
> problem?
>
> Thank you,
> -Mua
>
> ---------------------------------------
> *Type 1 task log*
>
> 3-04-12 13:42:24,960 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 5 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:42:25,259 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:42:25,271 INFO org.apache.hadoop.mapred.ReduceTask:
> Initiating in-memory merge with 610 segments...
> 2013-04-12 13:42:25,273 INFO org.apache.hadoop.mapred.Merger: Merging 610
> sorted segments
> 2013-04-12 13:42:25,275 INFO org.apache.hadoop.mapred.Merger: Down to the
> last merge-pass, with 610 segments left of total size: 96922927 bytes
> 2013-04-12 13:42:27,348 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Merge of the 610 files in-memory
> complete. Local file is
>
> /hdfs/sp/filesystem/mapred/local/taskTracker/vul/jobcache/job_201304081613_0049/attempt_201304081613_0049_r_000009_0/output/map_6.out
> of size 96921713
> 2013-04-12 13:42:27,349 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Thread waiting: Thread for merging
> on-disk files
> 2013-04-12 13:42:30,263 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:42:35,267 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 2 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:42:38,145 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0049_m_000584_0'
> 2013-04-12 13:42:44,150 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0049_m_000557_0'
> 2013-04-12 13:42:55,283 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:43:05,164 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0049_m_000604_0'
> 2013-04-12 13:43:06,036 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:43:11,169 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0049_m_000597_1'
> 2013-04-12 13:43:21,040 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Need another 5 map output(s) where 0
> is already in progress
> 2013-04-12 13:43:21,040 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 0 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:44:21,042 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Need another 5 map output(s) where 0
> is already in progress
> 2013-04-12 13:44:21,043 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:44:29,222 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0049_m_000576_0'
> 2013-04-12 13:45:21,333 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Need another 4 map output(s) where 0
> is already in progress
> 2013-04-12 13:45:21,333 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 0 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:46:01,334 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:46:06,341 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:46:21,350 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Need another 2 map output(s) where 0
> is already in progress
> 2013-04-12 13:46:21,350 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 0 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:46:41,301 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0049_m_000616_1'
> 2013-04-12 13:46:41,351 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0049_r_000009_0 Scheduled 2 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-12 13:46:42,301 INFO org.apache.hadoop.mapred.ReduceTask:
> GetMapEventsThread exiting
> 2013-04-12 13:46:42,302 INFO org.apache.hadoop.mapred.ReduceTask:
> getMapsEventsThread joined.
> 2013-04-12 13:46:42,302 INFO org.apache.hadoop.mapred.ReduceTask: Closed
> ram manager
> 2013-04-12 13:46:42,302 INFO org.apache.hadoop.mapred.ReduceTask:
> Interleaved on-disk merge complete: 1 files left.
> 2013-04-12 13:46:42,302 INFO org.apache.hadoop.mapred.ReduceTask: In-memory
> merge complete: 11 files left.
> 2013-04-12 13:46:42,303 INFO org.apache.hadoop.mapred.Merger: Merging 11
> sorted segments
> 2013-04-12 13:46:42,303 INFO org.apache.hadoop.mapred.Merger: Down to the
> last merge-pass, with 11 segments left of total size: 3152550 bytes
> 2013-04-12 13:46:42,393 INFO org.apache.hadoop.mapred.ReduceTask: Merged 11
> segments, 3152550 bytes to disk to satisfy reduce memory limit
> 2013-04-12 13:46:42,394 INFO org.apache.hadoop.mapred.ReduceTask: Merging 2
> files, 100074247 bytes from disk
> 2013-04-12 13:46:42,395 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0
> segments, 0 bytes from memory into reduce
> 2013-04-12 13:46:42,395 INFO org.apache.hadoop.mapred.Merger: Merging 2
> sorted segments
> 2013-04-12 13:46:42,398 INFO org.apache.hadoop.mapred.Merger: Down to the
> last merge-pass, with 2 segments left of total size: 100074239 bytes
> 2013-04-12 13:57:45,872 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> blk_-199210310173610155_28360java.net.SocketTimeoutException: 69000 millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.6.25.33:47987 remote=/
> 10.6.25.33:49197]
> at
>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> at java.io.DataInputStream.readFully(DataInputStream.java:189)
> at java.io.DataInputStream.readLong(DataInputStream.java:410)
> at
>
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
> at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2967)
>
> 2013-04-12 14:00:00,777 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_-199210310173610155_28360 bad datanode[0]
> 10.6.25.33:49197
> 2013-04-12 14:00:00,866 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_-199210310173610155_28360 in pipeline
> 10.6.25.33:49197, 10.6.25.141:39369, 10.6.25.31:54563: bad datanode
> 10.6.25.33:49197
> 2013-04-12 14:04:55,904 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream 10.6.25.33:49197 java.io.IOException: Bad connect
> ack with firstBadLink as 10.6.25.32:53741
> 2013-04-12 14:04:55,904 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-8936348770928346278_28365
> 2013-04-12 14:04:55,907 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
> datanode 10.6.25.32:53741
> 2013-04-12 14:06:07,789 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> blk_8322203915584568195_28367java.io.IOException: Bad response 1 for block
> blk_8322203915584568195_28367 from datanode 10.6.25.31:54563
> at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2977)
>
> 2013-04-12 14:06:25,735 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_8322203915584568195_28367 bad datanode[2]
> 10.6.25.31:54563
> 2013-04-12 14:06:25,735 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_8322203915584568195_28367 in pipeline
> 10.6.25.33:49197, 10.6.25.145:48897, 10.6.25.31:54563: bad datanode
> 10.6.25.31:54563
> 2013-04-12 14:06:45,112 WARN org.apache.hadoop.mapred.Task: Parent died.
>  Exiting attempt_201304081613_0049_r_000009_0
>
>
> ----------------------------------------------------------
>
> *Type 2 task log*
>
>
> 14_1'
> 2013-04-11 15:24:33,168 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0046_m_000609_1'
> 2013-04-11 15:24:39,172 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0046_m_000599_0'
> 2013-04-11 15:25:01,179 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0046_r_000010_0 Need another 5 map output(s) where 0
> is already in progress
> 2013-04-11 15:25:01,179 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0046_r_000010_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-11 15:25:24,203 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0046_m_000577_0'
> 2013-04-11 15:25:51,529 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0046_r_000010_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-11 15:26:00,227 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0046_m_000556_1'
> 2013-04-11 15:26:01,558 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0046_r_000010_0 Need another 3 map output(s) where 0
> is already in progress
> 2013-04-11 15:26:01,558 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0046_r_000010_0 Scheduled 2 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-11 15:26:06,235 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0046_m_000576_0'
> 2013-04-11 15:26:06,235 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
> obsolete output of KILLED map-task: 'attempt_201304081613_0046_m_000560_1'
> 2013-04-11 15:26:06,603 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201304081613_0046_r_000010_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
> 2013-04-11 15:26:07,236 INFO org.apache.hadoop.mapred.ReduceTask:
> GetMapEventsThread exiting
> 2013-04-11 15:26:07,236 INFO org.apache.hadoop.mapred.ReduceTask:
> getMapsEventsThread joined.
> 2013-04-11 15:26:07,236 INFO org.apache.hadoop.mapred.ReduceTask: Closed
> ram manager
> 2013-04-11 15:26:07,236 INFO org.apache.hadoop.mapred.ReduceTask:
> Interleaved on-disk merge complete: 1 files left.
> 2013-04-11 15:26:07,236 INFO org.apache.hadoop.mapred.ReduceTask: In-memory
> merge complete: 109 files left.
> 2013-04-11 15:26:07,238 INFO org.apache.hadoop.mapred.Merger: Merging 109
> sorted segments
> 2013-04-11 15:26:07,238 INFO org.apache.hadoop.mapred.Merger: Down to the
> last merge-pass, with 109 segments left of total size: 23323822 bytes
> 2013-04-11 15:26:07,528 INFO org.apache.hadoop.mapred.ReduceTask: Merged
> 109 segments, 23323822 bytes to disk to satisfy reduce memory limit
> 2013-04-11 15:26:07,528 INFO org.apache.hadoop.mapred.ReduceTask: Merging 2
> files, 120503030 bytes from disk
> 2013-04-11 15:26:07,529 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0
> segments, 0 bytes from memory into reduce
> 2013-04-11 15:26:07,529 INFO org.apache.hadoop.mapred.Merger: Merging 2
> sorted segments
> 2013-04-11 15:26:07,531 INFO org.apache.hadoop.mapred.Merger: Down to the
> last merge-pass, with 2 segments left of total size: 120503022 bytes
> 2013-04-11 15:28:34,121 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream 10.6.25.144:40971 java.io.IOException: Bad connect
> ack with firstBadLink as 10.6.25.32:53741
> 2013-04-11 15:28:34,121 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_423162934996020555_28131
> 2013-04-11 15:28:34,123 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
> datanode 10.6.25.32:53741
> 2013-04-11 15:58:06,150 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream 10.6.25.144:40971 java.io.IOException: Bad connect
> ack with firstBadLink as 10.6.25.31:54563
> 2013-04-11 15:58:06,150 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7269648817395135125_28148
> 2013-04-11 15:58:06,152 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
> datanode 10.6.25.31:54563
> 2013-04-11 16:06:10,175 WARN org.apache.hadoop.mapred.Task: Parent died.
>  Exiting attempt_201304081613_0046_r_000010_0
>
>
>
> On Fri, Apr 12, 2013 at 2:25 PM, Cheolsoo Park <[email protected]>
> wrote:
>
> > Hi Mua,
> >
> > I guess you misunderstood me. The pig_****.log file is not a task log.
> >
> > You should look for task logs on data nodes where your task tracker ran.
> > Here is some explanation regarding various log files in Hadoop and where
> to
> > find them:
> >
> >
> >
> http://blog.cloudera.com/blog/2009/09/apache-hadoop-log-files-where-to-find-them-in-cdh-and-what-info-they-contain/
> >
> > Thanks,
> > Cheolsoo
> >
> >
> > On Fri, Apr 12, 2013 at 10:27 AM, Mua Ban <[email protected]> wrote:
> >
> > > Thank you very much for your reply.
> > >
> > > Below is the stack log in the pig_****.log file
> > >
> > > Can you please give me some suggestions?
> > >
> > > -Mua
> > > ------------------
> > > Backend error message
> > > ---------------------
> > > Task attempt_201304081613_0048_r_000001_0 failed to report status for
> 601
> > > seconds. Killing!
> > >
> > > Pig Stack Trace
> > > ---------------
> > > ERROR 2997: Unable to recreate exception from backed error: Task
> > > attempt_201304081613_0048_r_000001_0 failed to report status for 601
> > > seconds. Killing!
> > >
> > > org.apache.pig.backend.executionengine.ExecException: ERROR 2997:
> Unable
> > to
> > > recreate exception from backed error: Task
> > > attempt_201304081613_0048_r_000001_0 failed to report status for 601
> > > seconds. Killing!
> > >         at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:217)
> > >         at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:152)
> > >         at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:383)
> > >         at org.apache.pig.PigServer.launchPlan(PigServer.java:1270)
> > >         at
> > >
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1255)
> > >         at org.apache.pig.PigServer.execute(PigServer.java:1245)
> > >         at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
> > >         at
> > >
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
> > >         at
> > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
> > >         at
> > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > >         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> > >         at org.apache.pig.Main.run(Main.java:555)
> > >         at org.apache.pig.Main.main(Main.java:111)
> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >         at
> > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
> > >         at
> > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
> > >         at java.lang.reflect.Method.invoke(Method.java:611)
> > >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > >
> > >
> >
> ================================================================================
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Apr 12, 2013 at 11:29 AM, Cheolsoo Park <[email protected]
> > > >wrote:
> > >
> > > > Did you look at task logs to see why those tasks failed? Since it's a
> > > > back-end error, the console output doesn't tell you much. Task logs
> > > should
> > > > have a stack trace that shows why it failed, and you can go from
> there.
> > > >
> > > >
> > > >
> > > > On Fri, Apr 12, 2013 at 8:18 AM, Mua Ban <[email protected]>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am very new to PIG/Hadoop, I just started writing my first PIG
> > > script a
> > > > > couple days ago. I ran into this problem.
> > > > >
> > > > > My cluster has 9 nodes. I have to join two data sets big and small,
> > > each
> > > > is
> > > > > collected for 4 weeks. I first take two subsets of my data set
> (which
> > > is
> > > > > for the first week of data), let's call them B1 and S1 for big and
> > > small
> > > > > data sets of the first week. The entire data sets of 4 weeks is B4
> > and
> > > > S4.
> > > > >
> > > > > I ran my script on my cluster to join B1 and S1 and everything is
> > > fine. I
> > > > > got my joined data. However, when I ran my script to join B4 and
> S4,
> > > the
> > > > > script failed. B4 is 39GB, S4 is 300MB. B4 is skewed, some id
> appears
> > > > more
> > > > > frequently than others. I tried both 'using skewed' and 'using
> > > > replicated'
> > > > > modes for the join operation (by appending them to the end of the
> > below
> > > > > join clause), they both fail.
> > > > >
> > > > > Here is my script and i think it is very simple:
> > > > >
> > > > > *big = load 'bigdir/' using PigStorage(',') as (id:chararray,
> > > > > data:chararray);*
> > > > > *small = load 'smalldir/' using PigStorage(',') as
> > > > > (t1:double,t2:double,data:chararray,id:chararray);
> > > > > *
> > > > > *J = JOIN big by id LEFT OUTER, small by id;
> > > > > *
> > > > > *store J into 'outputdir' using PigStorage(',');
> > > > > *
> > > > >
> > > > > On the web ui of the tracker, I see that the job has 40 reducers (I
> > > guess
> > > > > since the total data is about 40GB, and each 1GB will need one
> > reducer
> > > by
> > > > > default of PIG and hadoop setting, so this is normal). If I use
> > > 'parallel
> > > > > 80' in the join operation above, then I see 80 reducers, and the
> join
> > > > > operation still failed.
> > > > >
> > > > > I checked file  mapred-default.xml and found this:
> > > > > <name>mapred.reduce.tasks</name>
> > > > >   <value>1</value>
> > > > >
> > > > > If I set the value of parallel in join operation, it should
> overwrite
> > > > this,
> > > > > right?
> > > > >
> > > > >
> > > > > On the tracker GUI, I see that for different runs, the number of
> > > > completed
> > > > > reducers changes from 4 to 10 (out of 40 total reducers). The
> tracker
> > > GUI
> > > > > shows the reason for the failed reducers: "Task
> > > > > attempt_201304081613_0046_r_000006_0 failed to report status for
> 600
> > > > > seconds. Killing!"
> > > > >
> > > > > *Could you please help?*
> > > > > Thank you very much,
> > > > > -Mua
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> --------------------------------------------------------------------------------------------------------------
> > > > > Here is the error report from the console screen where I ran this
> > > script:
> > > > >
> > > > > job_201304081613_0032   616     0       230     12      32      0
>   0
> > > > > 0       big     MAP_ONLY
> > > > > job_201304081613_0033   705     1       21      6       6
> 234 2
> > > > > 34      234             SAMPLER
> > > > >
> > > > > Failed Jobs:
> > > > > JobId   Alias   Feature Message Outputs
> > > > > job_201304081613_0034   small   SKEWED_JOIN     Message: Job
> failed!
> > > > > Error - # of failed Reduce Tasks exceeded allowed limit.
> FailedCount:
> > > 1.
> > > > > LastFailedTask: task_201304081613_0034_r_000012
> > > > >
> > > > > Input(s):
> > > > > Successfully read 364285458 records (39528533645 bytes) from:
> > > > > "hdfs://d0521b01:24990/user/abc/big/"
> > > > > Failed to read data from "hdfs://d0521b01:24990/user/abc/small/"
> > > > >
> > > > > Output(s):
> > > > >
> > > > > Counters:
> > > > > Total records written : 0
> > > > > Total bytes written : 0
> > > > > Spillable Memory Manager spill count : 0
> > > > > Total bags proactively spilled: 0
> > > > > Total records proactively spilled: 0
> > > > >
> > > > > Job DAG:
> > > > > job_201304081613_0032   ->      job_201304081613_0033,
> > > > > job_201304081613_0033   ->      job_201304081613_0034,
> > > > > job_201304081613_0034   ->      null,
> > > > > null
> > > > >
> > > > >
> > > > > 2013-04-10 20:11:23,815 [main] WARN
> > > > >
> > > > >
> > > >
> > >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - Encountered Warning
> > > > > REDUCER_COUNT_LOW 1 time(s).
> > > > > 2013-04-10 20:11:23,815 [main] INFO
> > > > >
> > > > >
> > > >
> > >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > > - Some jobs have faile
> > > > > d! Stop running all dependent jobs
> > > > > 2013-04-10 20:11:23,815 [main] ERROR
> > > > org.apache.pig.tools.grunt.GruntParser
> > > > > - ERROR 2997: Encountered IOException. java.io.IOException: Er
> > > > > ror Recovery for block blk_312487981794332936_26563 failed  because
> > > > > recovery from primary datanode 10.6.25.31:54563 failed 6 times.
> >  Pipel
> > > > > ine was 10.6.25.31:54563. Aborting...
> > > > > Details at logfile:
> > > /homes/abc/pig-flatten/scripts/pig_1365627648226.log
> > > > > 2013-04-10 20:11:23,818 [main] ERROR
> > > > org.apache.pig.tools.grunt.GruntParser
> > > > > - ERROR 2244: Job failed, hadoop does not return any error mes
> > > > > sage
> > > > > Details at logfile:
> > > /homes/abc/pig-flatten/scripts/pig_1365627648226.log
> > > > >
> > > >
> > >
> >
>

Re: join operation fails on big data set

Reply via email to