Hi, I tried different things, finally changing the io.sort.mb to a smaller value helped resolving this issue.
On 15 May 2013 06:29, Cheolsoo Park <piaozhe...@gmail.com> wrote: > Hi, > > Sounds like your mappers are overloaded. Can you try the following? > > 1. You can set mapred.max.split.size to a smaller value, so more mappers > can be launched. > > or > > 2. You can set mapred.task.timeout to a larger value. The default value is > 600 seconds. > > Thanks, > Cheolsoo > > > > On Mon, May 13, 2013 at 8:03 PM, Praveen Bysani <praveen.ii...@gmail.com > >wrote: > > > Hi, > > > > I have a very weird issue with my PIG script. Following is the content of > > my script > > > > *REGISTER /home/hadoopuser/Workspace/lib/piggybank.jar* > > *REGISTER /home/hadoopuser/Workspace/lib/datafu.jar;* > > *REGISTER > > > /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hbase/hbase-0.94.2-cdh4.2.1-security.jar; > > * > > *REGISTER > > > /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/zookeeper/zookeeper-3.4.5-cdh4.2.1.jar; > > * > > *SET default_parallel 15;* > > > > *records = LOAD 'hbase://dm-re' USING > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('v:ctm v:src','-caching > > 5000 -gt 1366098805& -lt 1366102543&') as > > (time:chararray,company:chararray);* > > > > *records_iso = FOREACH records GENERATE > > > org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO(time,'yyyy-MM-dd > > HH:mm:ss Z') as iso_time;* > > *records_group = GROUP records_iso ALL;* > > *result = FOREACH records_group GENERATE MAX(records_iso.iso_time) as > > maxtime;* > > *DUMP result* > > > > When i try to run this script in cluster of 5 nodes with 20 map slots, > > most of the map tasks fail with the following error after 10 mins of > > initializing, > > *Task attempt <id> failed to report status for 600 seconds. Killing!* > > > > I tried to decrease the caching size to less than 100 or so, (under the > > intuition that maybe fetching and processing more cache is taking more > > time) but still the same issue. However if i manage to load the rows > (using > > lt and gt) such that number of map tasks are <=2, the job will be > > successfully finished. When the number of tasks is > 2 , it is always the > > case that 2-4 tasks are completed and the rest all fail with the above > > mentioned error. I attach the task tracker log hereby for this attempt. I > > don't see any error except for some zookeeper connection warnings. I > > manually checked from that node and doing a 'hbase zkcli' connects > without > > any issue. Hence, I assume that zookeeper is configured properly. > > > > I don't really understand where to debug this problem. It would be great > > if someone could provide assistance. Some configurations of the cluster, > > which i think maybe relevant here, > > *dfs.block.size = 1 GB > > io.sort.mb = 1 GB > > HRegion size = 1 GB > > > > * > > and the size of the hbase table is close to 250 GB. I have observed 100% > > cpu usage by the mapred user on the node, while the task is under > > execution. I am not really sure, what to optimize in this case for the > job > > to complete. It would be good if someone can throw some light in this > > direction. > > > > PS: All my nodes in the cluster are configured on a EBS backed amazon ec2 > > cluster. > > > > > > -- > > Regards, > > Praveen Bysani > > http://www.praveenbysani.com > > > -- Regards, Praveen Bysani http://www.praveenbysani.com