subject:"\[jira\] \[Commented\] \(MAPREDUCE\-6076\) Zero map split input length combine with none zero map split input length will cause MR1 job hung."

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

2015-04-01 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391777#comment-14391777
 ] 

Robert Kanter commented on MAPREDUCE-6076:
--

+1

 Zero map split input length combine with none zero  map split input length 
 will cause MR1 job hung. 
 

 Key: MAPREDUCE-6076
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6076
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6076.branch-1.000.patch


 Zero map split input length combine with none zero map split input length 
 will cause MR1 job hung. 
 This problem may happen when use HBASE input split(TableSplit).
 HBASE split input length can be zero for unknown regions or non-zero for 
 known regions in the following code:
 {code}
 // TableSplit.java
 public long getLength() {
 return length;
   }
 // RegionSizeCalculator.java
 public long getRegionSize(byte[] regionId) {
 Long size = sizeMap.get(regionId);
 if (size == null) {
   LOG.debug(Unknown region: + Arrays.toString(regionId));
   return 0;
 } else {
   return size;
 }
   }
 {code}
 The TableSplit length come from RegionSizeCalculator.getRegionSize.
 The job hung is because in MR1,
 If these zero split input length map tasks are scheduled and completed before 
 all none zero split input length map tasks are scheduled,
 Scheduling new map task in JobProgress.java will be failed to pass the 
 TaskTracker resources check at.
 {code}
 // findNewMapTask
 // Check to ensure this TaskTracker has enough resources to 
 // run tasks from this job
 long outSize = resourceEstimator.getEstimatedMapOutputSize();
 long availSpace = tts.getResourceStatus().getAvailableSpace();
 if(availSpace  outSize) {
   LOG.warn(No room for map task. Node  + tts.getHost() + 
 has  + availSpace + 
 bytes free; but we expect map to take  + outSize);
   return -1; //see if a different TIP might work better. 
 }
 {code}
 The resource calculation is at
 {code}
 // in ResourceEstimator.java
 protected synchronized long getEstimatedTotalMapOutputSize()  {
 if(completedMapsUpdates  threshholdToUse) {
   return 0;
 } else {
   long inputSize = job.getInputLength() + job.desiredMaps(); 
   //add desiredMaps() so that randomwriter case doesn't blow up
   //the multiplication might lead to overflow, casting it with
   //double prevents it
   long estimate = Math.round(((double)inputSize * 
   completedMapsOutputSize * 2.0)/completedMapsInputSize);
   if (LOG.isDebugEnabled()) {
 LOG.debug(estimate total map output will be  + estimate);
   }
   return estimate;
 }
   }
 protected synchronized void updateWithCompletedTask(TaskStatus ts, 
   TaskInProgress tip) {
 //-1 indicates error, which we don't average in.
 if(tip.isMapTask()   ts.getOutputSize() != -1)  {
   completedMapsUpdates++;
   completedMapsInputSize+=(tip.getMapInputSize()+1);
   completedMapsOutputSize+=ts.getOutputSize();
   if(LOG.isDebugEnabled()) {
 LOG.debug(completedMapsUpdates:+completedMapsUpdates+  +
   completedMapsInputSize:+completedMapsInputSize+   +
   completedMapsOutputSize:+completedMapsOutputSize);
   }
 }
   }
 {code}
 You can see in the calculation:
 completedMapsInputSize will be a very small number and inputSize * 
   completedMapsOutputSize  will be a very big number
 For example, completedMapsInputSize = 1; inputSize = 100MBytes and  
 completedMapsOutputSize=100MBytes,
 The estimate will be 5000TB which will be more than most task tracker disk 
 space size.
 So I think if the map split input length is 0, it means the split input 
 length is unknown and it is reasonable to use map output size as input size 
 for the calculation in ResourceEstimator. I will upload a fix based on this 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

2015-01-23 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289766#comment-14289766
 ] 

zhihai xu commented on MAPREDUCE-6076:
--

Hi [~rchiang],
thanks for the review.
[~rkanter], Could you also help review and commit the patch?
thanks
zhihai

 Zero map split input length combine with none zero  map split input length 
 will cause MR1 job hung. 
 

 Key: MAPREDUCE-6076
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6076
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6076.branch-1.000.patch


 Zero map split input length combine with none zero map split input length 
 will cause MR1 job hung. 
 This problem may happen when use HBASE input split(TableSplit).
 HBASE split input length can be zero for unknown regions or non-zero for 
 known regions in the following code:
 {code}
 // TableSplit.java
 public long getLength() {
 return length;
   }
 // RegionSizeCalculator.java
 public long getRegionSize(byte[] regionId) {
 Long size = sizeMap.get(regionId);
 if (size == null) {
   LOG.debug(Unknown region: + Arrays.toString(regionId));
   return 0;
 } else {
   return size;
 }
   }
 {code}
 The TableSplit length come from RegionSizeCalculator.getRegionSize.
 The job hung is because in MR1,
 If these zero split input length map tasks are scheduled and completed before 
 all none zero split input length map tasks are scheduled,
 Scheduling new map task in JobProgress.java will be failed to pass the 
 TaskTracker resources check at.
 {code}
 // findNewMapTask
 // Check to ensure this TaskTracker has enough resources to 
 // run tasks from this job
 long outSize = resourceEstimator.getEstimatedMapOutputSize();
 long availSpace = tts.getResourceStatus().getAvailableSpace();
 if(availSpace  outSize) {
   LOG.warn(No room for map task. Node  + tts.getHost() + 
 has  + availSpace + 
 bytes free; but we expect map to take  + outSize);
   return -1; //see if a different TIP might work better. 
 }
 {code}
 The resource calculation is at
 {code}
 // in ResourceEstimator.java
 protected synchronized long getEstimatedTotalMapOutputSize()  {
 if(completedMapsUpdates  threshholdToUse) {
   return 0;
 } else {
   long inputSize = job.getInputLength() + job.desiredMaps(); 
   //add desiredMaps() so that randomwriter case doesn't blow up
   //the multiplication might lead to overflow, casting it with
   //double prevents it
   long estimate = Math.round(((double)inputSize * 
   completedMapsOutputSize * 2.0)/completedMapsInputSize);
   if (LOG.isDebugEnabled()) {
 LOG.debug(estimate total map output will be  + estimate);
   }
   return estimate;
 }
   }
 protected synchronized void updateWithCompletedTask(TaskStatus ts, 
   TaskInProgress tip) {
 //-1 indicates error, which we don't average in.
 if(tip.isMapTask()   ts.getOutputSize() != -1)  {
   completedMapsUpdates++;
   completedMapsInputSize+=(tip.getMapInputSize()+1);
   completedMapsOutputSize+=ts.getOutputSize();
   if(LOG.isDebugEnabled()) {
 LOG.debug(completedMapsUpdates:+completedMapsUpdates+  +
   completedMapsInputSize:+completedMapsInputSize+   +
   completedMapsOutputSize:+completedMapsOutputSize);
   }
 }
   }
 {code}
 You can see in the calculation:
 completedMapsInputSize will be a very small number and inputSize * 
   completedMapsOutputSize  will be a very big number
 For example, completedMapsInputSize = 1; inputSize = 100MBytes and  
 completedMapsOutputSize=100MBytes,
 The estimate will be 5000TB which will be more than most task tracker disk 
 space size.
 So I think if the map split input length is 0, it means the split input 
 length is unknown and it is reasonable to use map output size as input size 
 for the calculation in ResourceEstimator. I will upload a fix based on this 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

2015-01-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289776#comment-14289776
 ] 

Hadoop QA commented on MAPREDUCE-6076:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12666981/MAPREDUCE-6076.branch-1.000.patch
  against trunk revision 24aa462.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5117//console

This message is automatically generated.

 Zero map split input length combine with none zero  map split input length 
 will cause MR1 job hung. 
 

 Key: MAPREDUCE-6076
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6076
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6076.branch-1.000.patch


 Zero map split input length combine with none zero map split input length 
 will cause MR1 job hung. 
 This problem may happen when use HBASE input split(TableSplit).
 HBASE split input length can be zero for unknown regions or non-zero for 
 known regions in the following code:
 {code}
 // TableSplit.java
 public long getLength() {
 return length;
   }
 // RegionSizeCalculator.java
 public long getRegionSize(byte[] regionId) {
 Long size = sizeMap.get(regionId);
 if (size == null) {
   LOG.debug(Unknown region: + Arrays.toString(regionId));
   return 0;
 } else {
   return size;
 }
   }
 {code}
 The TableSplit length come from RegionSizeCalculator.getRegionSize.
 The job hung is because in MR1,
 If these zero split input length map tasks are scheduled and completed before 
 all none zero split input length map tasks are scheduled,
 Scheduling new map task in JobProgress.java will be failed to pass the 
 TaskTracker resources check at.
 {code}
 // findNewMapTask
 // Check to ensure this TaskTracker has enough resources to 
 // run tasks from this job
 long outSize = resourceEstimator.getEstimatedMapOutputSize();
 long availSpace = tts.getResourceStatus().getAvailableSpace();
 if(availSpace  outSize) {
   LOG.warn(No room for map task. Node  + tts.getHost() + 
 has  + availSpace + 
 bytes free; but we expect map to take  + outSize);
   return -1; //see if a different TIP might work better. 
 }
 {code}
 The resource calculation is at
 {code}
 // in ResourceEstimator.java
 protected synchronized long getEstimatedTotalMapOutputSize()  {
 if(completedMapsUpdates  threshholdToUse) {
   return 0;
 } else {
   long inputSize = job.getInputLength() + job.desiredMaps(); 
   //add desiredMaps() so that randomwriter case doesn't blow up
   //the multiplication might lead to overflow, casting it with
   //double prevents it
   long estimate = Math.round(((double)inputSize * 
   completedMapsOutputSize * 2.0)/completedMapsInputSize);
   if (LOG.isDebugEnabled()) {
 LOG.debug(estimate total map output will be  + estimate);
   }
   return estimate;
 }
   }
 protected synchronized void updateWithCompletedTask(TaskStatus ts, 
   TaskInProgress tip) {
 //-1 indicates error, which we don't average in.
 if(tip.isMapTask()   ts.getOutputSize() != -1)  {
   completedMapsUpdates++;
   completedMapsInputSize+=(tip.getMapInputSize()+1);
   completedMapsOutputSize+=ts.getOutputSize();
   if(LOG.isDebugEnabled()) {
 LOG.debug(completedMapsUpdates:+completedMapsUpdates+  +
   completedMapsInputSize:+completedMapsInputSize+   +
   completedMapsOutputSize:+completedMapsOutputSize);
   }
 }
   }
 {code}
 You can see in the calculation:
 completedMapsInputSize will be a very small number and inputSize * 
   completedMapsOutputSize  will be a very big number
 For example, completedMapsInputSize = 1; inputSize = 100MBytes and  
 completedMapsOutputSize=100MBytes,
 The estimate will be 5000TB which will be more than most task tracker disk 
 space size.
 So I think if the map split input length is 0, it means the split input 
 length is unknown and it is reasonable to use map output size as input size 
 for the calculation in ResourceEstimator. I will upload a fix based on this 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

2015-01-23 Thread Ray Chiang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289763#comment-14289763
 ] 

Ray Chiang commented on MAPREDUCE-6076:
---

+1 (non-binding).  Code looks fine to me and the test passes in my tree.

 Zero map split input length combine with none zero  map split input length 
 will cause MR1 job hung. 
 

 Key: MAPREDUCE-6076
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6076
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6076.branch-1.000.patch


 Zero map split input length combine with none zero map split input length 
 will cause MR1 job hung. 
 This problem may happen when use HBASE input split(TableSplit).
 HBASE split input length can be zero for unknown regions or non-zero for 
 known regions in the following code:
 {code}
 // TableSplit.java
 public long getLength() {
 return length;
   }
 // RegionSizeCalculator.java
 public long getRegionSize(byte[] regionId) {
 Long size = sizeMap.get(regionId);
 if (size == null) {
   LOG.debug(Unknown region: + Arrays.toString(regionId));
   return 0;
 } else {
   return size;
 }
   }
 {code}
 The TableSplit length come from RegionSizeCalculator.getRegionSize.
 The job hung is because in MR1,
 If these zero split input length map tasks are scheduled and completed before 
 all none zero split input length map tasks are scheduled,
 Scheduling new map task in JobProgress.java will be failed to pass the 
 TaskTracker resources check at.
 {code}
 // findNewMapTask
 // Check to ensure this TaskTracker has enough resources to 
 // run tasks from this job
 long outSize = resourceEstimator.getEstimatedMapOutputSize();
 long availSpace = tts.getResourceStatus().getAvailableSpace();
 if(availSpace  outSize) {
   LOG.warn(No room for map task. Node  + tts.getHost() + 
 has  + availSpace + 
 bytes free; but we expect map to take  + outSize);
   return -1; //see if a different TIP might work better. 
 }
 {code}
 The resource calculation is at
 {code}
 // in ResourceEstimator.java
 protected synchronized long getEstimatedTotalMapOutputSize()  {
 if(completedMapsUpdates  threshholdToUse) {
   return 0;
 } else {
   long inputSize = job.getInputLength() + job.desiredMaps(); 
   //add desiredMaps() so that randomwriter case doesn't blow up
   //the multiplication might lead to overflow, casting it with
   //double prevents it
   long estimate = Math.round(((double)inputSize * 
   completedMapsOutputSize * 2.0)/completedMapsInputSize);
   if (LOG.isDebugEnabled()) {
 LOG.debug(estimate total map output will be  + estimate);
   }
   return estimate;
 }
   }
 protected synchronized void updateWithCompletedTask(TaskStatus ts, 
   TaskInProgress tip) {
 //-1 indicates error, which we don't average in.
 if(tip.isMapTask()   ts.getOutputSize() != -1)  {
   completedMapsUpdates++;
   completedMapsInputSize+=(tip.getMapInputSize()+1);
   completedMapsOutputSize+=ts.getOutputSize();
   if(LOG.isDebugEnabled()) {
 LOG.debug(completedMapsUpdates:+completedMapsUpdates+  +
   completedMapsInputSize:+completedMapsInputSize+   +
   completedMapsOutputSize:+completedMapsOutputSize);
   }
 }
   }
 {code}
 You can see in the calculation:
 completedMapsInputSize will be a very small number and inputSize * 
   completedMapsOutputSize  will be a very big number
 For example, completedMapsInputSize = 1; inputSize = 100MBytes and  
 completedMapsOutputSize=100MBytes,
 The estimate will be 5000TB which will be more than most task tracker disk 
 space size.
 So I think if the map split input length is 0, it means the split input 
 length is unknown and it is reasonable to use map output size as input size 
 for the calculation in ResourceEstimator. I will upload a fix based on this 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

2014-09-05 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124281#comment-14124281
 ] 

zhihai xu commented on MAPREDUCE-6076:
--

A patch:MAPREDUCE-6076.branch-1.000.patch is uploaded for review.

 Zero map split input length combine with none zero  map split input length 
 will cause MR1 job hung. 
 

 Key: MAPREDUCE-6076
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6076
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6076.branch-1.000.patch


 Zero map split input length combine with none zero map split input length 
 will cause MR1 job hung. 
 This problem may happen when use HBASE input split(TableSplit).
 HBASE split input length can be zero for unknown regions or non-zero for 
 known regions in the following code:
 {code}
 // TableSplit.java
 public long getLength() {
 return length;
   }
 // RegionSizeCalculator.java
 public long getRegionSize(byte[] regionId) {
 Long size = sizeMap.get(regionId);
 if (size == null) {
   LOG.debug(Unknown region: + Arrays.toString(regionId));
   return 0;
 } else {
   return size;
 }
   }
 {code}
 The TableSplit length come from RegionSizeCalculator.getRegionSize.
 The job hung is because in MR1,
 If these zero split input length map tasks are scheduled and completed before 
 all none zero split input length map tasks are scheduled,
 Scheduling new map task in JobProgress.java will be failed to pass the 
 TaskTracker resources check at.
 {code}
 // findNewMapTask
 // Check to ensure this TaskTracker has enough resources to 
 // run tasks from this job
 long outSize = resourceEstimator.getEstimatedMapOutputSize();
 long availSpace = tts.getResourceStatus().getAvailableSpace();
 if(availSpace  outSize) {
   LOG.warn(No room for map task. Node  + tts.getHost() + 
 has  + availSpace + 
 bytes free; but we expect map to take  + outSize);
   return -1; //see if a different TIP might work better. 
 }
 {code}
 The resource calculation is at
 {code}
 // in ResourceEstimator.java
 protected synchronized long getEstimatedTotalMapOutputSize()  {
 if(completedMapsUpdates  threshholdToUse) {
   return 0;
 } else {
   long inputSize = job.getInputLength() + job.desiredMaps(); 
   //add desiredMaps() so that randomwriter case doesn't blow up
   //the multiplication might lead to overflow, casting it with
   //double prevents it
   long estimate = Math.round(((double)inputSize * 
   completedMapsOutputSize * 2.0)/completedMapsInputSize);
   if (LOG.isDebugEnabled()) {
 LOG.debug(estimate total map output will be  + estimate);
   }
   return estimate;
 }
   }
 protected synchronized void updateWithCompletedTask(TaskStatus ts, 
   TaskInProgress tip) {
 //-1 indicates error, which we don't average in.
 if(tip.isMapTask()   ts.getOutputSize() != -1)  {
   completedMapsUpdates++;
   completedMapsInputSize+=(tip.getMapInputSize()+1);
   completedMapsOutputSize+=ts.getOutputSize();
   if(LOG.isDebugEnabled()) {
 LOG.debug(completedMapsUpdates:+completedMapsUpdates+  +
   completedMapsInputSize:+completedMapsInputSize+   +
   completedMapsOutputSize:+completedMapsOutputSize);
   }
 }
   }
 {code}
 You can see in the calculation:
 completedMapsInputSize will be a very small number and inputSize * 
   completedMapsOutputSize  will be a very big number
 For example, completedMapsInputSize = 1; inputSize = 100MBytes and  
 completedMapsOutputSize=100MBytes,
 The estimate will be 5000TB which will be more than most task tracker disk 
 space size.
 So I think if the map split input length is 0, it means the split input 
 length is unknown and it is reasonable to use map output size as input size 
 for the calculation in ResourceEstimator. I will upload a fix based on this 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

2014-09-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124286#comment-14124286
 ] 

Hadoop QA commented on MAPREDUCE-6076:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12666981/MAPREDUCE-6076.branch-1.000.patch
  against trunk revision e6420fe.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4859//console

This message is automatically generated.

 Zero map split input length combine with none zero  map split input length 
 will cause MR1 job hung. 
 

 Key: MAPREDUCE-6076
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6076
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6076.branch-1.000.patch


 Zero map split input length combine with none zero map split input length 
 will cause MR1 job hung. 
 This problem may happen when use HBASE input split(TableSplit).
 HBASE split input length can be zero for unknown regions or non-zero for 
 known regions in the following code:
 {code}
 // TableSplit.java
 public long getLength() {
 return length;
   }
 // RegionSizeCalculator.java
 public long getRegionSize(byte[] regionId) {
 Long size = sizeMap.get(regionId);
 if (size == null) {
   LOG.debug(Unknown region: + Arrays.toString(regionId));
   return 0;
 } else {
   return size;
 }
   }
 {code}
 The TableSplit length come from RegionSizeCalculator.getRegionSize.
 The job hung is because in MR1,
 If these zero split input length map tasks are scheduled and completed before 
 all none zero split input length map tasks are scheduled,
 Scheduling new map task in JobProgress.java will be failed to pass the 
 TaskTracker resources check at.
 {code}
 // findNewMapTask
 // Check to ensure this TaskTracker has enough resources to 
 // run tasks from this job
 long outSize = resourceEstimator.getEstimatedMapOutputSize();
 long availSpace = tts.getResourceStatus().getAvailableSpace();
 if(availSpace  outSize) {
   LOG.warn(No room for map task. Node  + tts.getHost() + 
 has  + availSpace + 
 bytes free; but we expect map to take  + outSize);
   return -1; //see if a different TIP might work better. 
 }
 {code}
 The resource calculation is at
 {code}
 // in ResourceEstimator.java
 protected synchronized long getEstimatedTotalMapOutputSize()  {
 if(completedMapsUpdates  threshholdToUse) {
   return 0;
 } else {
   long inputSize = job.getInputLength() + job.desiredMaps(); 
   //add desiredMaps() so that randomwriter case doesn't blow up
   //the multiplication might lead to overflow, casting it with
   //double prevents it
   long estimate = Math.round(((double)inputSize * 
   completedMapsOutputSize * 2.0)/completedMapsInputSize);
   if (LOG.isDebugEnabled()) {
 LOG.debug(estimate total map output will be  + estimate);
   }
   return estimate;
 }
   }
 protected synchronized void updateWithCompletedTask(TaskStatus ts, 
   TaskInProgress tip) {
 //-1 indicates error, which we don't average in.
 if(tip.isMapTask()   ts.getOutputSize() != -1)  {
   completedMapsUpdates++;
   completedMapsInputSize+=(tip.getMapInputSize()+1);
   completedMapsOutputSize+=ts.getOutputSize();
   if(LOG.isDebugEnabled()) {
 LOG.debug(completedMapsUpdates:+completedMapsUpdates+  +
   completedMapsInputSize:+completedMapsInputSize+   +
   completedMapsOutputSize:+completedMapsOutputSize);
   }
 }
   }
 {code}
 You can see in the calculation:
 completedMapsInputSize will be a very small number and inputSize * 
   completedMapsOutputSize  will be a very big number
 For example, completedMapsInputSize = 1; inputSize = 100MBytes and  
 completedMapsOutputSize=100MBytes,
 The estimate will be 5000TB which will be more than most task tracker disk 
 space size.
 So I think if the map split input length is 0, it means the split input 
 length is unknown and it is reasonable to use map output size as input size 
 for the calculation in ResourceEstimator. I will upload a fix based on this 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

[jira] [Commented] (MAPREDUCE-6076) Zero map split input length combine with none zero map split input length will cause MR1 job hung.

6 matches

Site Navigation

Mail list logo

Footer information