Hi, Sounds like your mappers are overloaded. Can you try the following?
1. You can set mapred.max.split.size to a smaller value, so more mappers can be launched. or 2. You can set mapred.task.timeout to a larger value. The default value is 600 seconds. Thanks, Cheolsoo On Mon, May 13, 2013 at 8:03 PM, Praveen Bysani <praveen.ii...@gmail.com>wrote: > Hi, > > I have a very weird issue with my PIG script. Following is the content of > my script > > *REGISTER /home/hadoopuser/Workspace/lib/piggybank.jar* > *REGISTER /home/hadoopuser/Workspace/lib/datafu.jar;* > *REGISTER > /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hbase/hbase-0.94.2-cdh4.2.1-security.jar; > * > *REGISTER > /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/zookeeper/zookeeper-3.4.5-cdh4.2.1.jar; > * > *SET default_parallel 15;* > > *records = LOAD 'hbase://dm-re' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('v:ctm v:src','-caching > 5000 -gt 1366098805& -lt 1366102543&') as > (time:chararray,company:chararray);* > > *records_iso = FOREACH records GENERATE > org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO(time,'yyyy-MM-dd > HH:mm:ss Z') as iso_time;* > *records_group = GROUP records_iso ALL;* > *result = FOREACH records_group GENERATE MAX(records_iso.iso_time) as > maxtime;* > *DUMP result* > > When i try to run this script in cluster of 5 nodes with 20 map slots, > most of the map tasks fail with the following error after 10 mins of > initializing, > *Task attempt <id> failed to report status for 600 seconds. Killing!* > > I tried to decrease the caching size to less than 100 or so, (under the > intuition that maybe fetching and processing more cache is taking more > time) but still the same issue. However if i manage to load the rows (using > lt and gt) such that number of map tasks are <=2, the job will be > successfully finished. When the number of tasks is > 2 , it is always the > case that 2-4 tasks are completed and the rest all fail with the above > mentioned error. I attach the task tracker log hereby for this attempt. I > don't see any error except for some zookeeper connection warnings. I > manually checked from that node and doing a 'hbase zkcli' connects without > any issue. Hence, I assume that zookeeper is configured properly. > > I don't really understand where to debug this problem. It would be great > if someone could provide assistance. Some configurations of the cluster, > which i think maybe relevant here, > *dfs.block.size = 1 GB > io.sort.mb = 1 GB > HRegion size = 1 GB > > * > and the size of the hbase table is close to 250 GB. I have observed 100% > cpu usage by the mapred user on the node, while the task is under > execution. I am not really sure, what to optimize in this case for the job > to complete. It would be good if someone can throw some light in this > direction. > > PS: All my nodes in the cluster are configured on a EBS backed amazon ec2 > cluster. > > > -- > Regards, > Praveen Bysani > http://www.praveenbysani.com >