[jira] [Updated] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-4882:
-
  Resolution: Duplicate
   Fix Version/s: 2.6.0
Target Version/s:   (was: )
  Status: Resolved  (was: Patch Available)

Fixed in MAPREDUCE-6063. Sorry Jerry; didn't see this.

 Error in estimating the length of the output file in Spill Phase
 

 Key: MAPREDUCE-4882
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2, 1.0.3
 Environment: Any Environment
Reporter: Lijie Xu
Assignee: Jerry Chen
  Labels: BB2015-05-TBR, patch
 Fix For: 2.6.0

 Attachments: MAPREDUCE-4882.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The sortAndSpill() method in MapTask.java has an error in estimating the 
 length of the output file. 
 The long size should be (bufvoid - bufstart) + bufend not (bufvoid - 
 bufend) + bufstart when bufend  bufstart.
 Here is the original code in MapTask.java.
  private void sortAndSpill() throws IOException, ClassNotFoundException,
InterruptedException {
   //approximate the length of the output file to be the length of the
   //buffer + header lengths for the partitions
   long size = (bufend = bufstart
   ? bufend - bufstart
   : (bufvoid - bufend) + bufstart) +
   partitions * APPROX_HEADER_LENGTH;
   FSDataOutputStream out = null;
 --
 I had a test on TeraSort. A snippet from mapper's log is as follows:
 MapTask: Spilling map output: record full = true
 MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
 MapTask: kvstart = 262142; kvend = 131069; length = 655360
 MapTask: Finished spill 3
 In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
 52428700 (52 MB) because the number of spilled records is 524287 and each 
 record costs 100B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4882:

Labels: BB2015-05-TBR patch  (was: patch)

 Error in estimating the length of the output file in Spill Phase
 

 Key: MAPREDUCE-4882
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2, 1.0.3
 Environment: Any Environment
Reporter: Lijie Xu
Assignee: Jerry Chen
  Labels: BB2015-05-TBR, patch
 Attachments: MAPREDUCE-4882.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The sortAndSpill() method in MapTask.java has an error in estimating the 
 length of the output file. 
 The long size should be (bufvoid - bufstart) + bufend not (bufvoid - 
 bufend) + bufstart when bufend  bufstart.
 Here is the original code in MapTask.java.
  private void sortAndSpill() throws IOException, ClassNotFoundException,
InterruptedException {
   //approximate the length of the output file to be the length of the
   //buffer + header lengths for the partitions
   long size = (bufend = bufstart
   ? bufend - bufstart
   : (bufvoid - bufend) + bufstart) +
   partitions * APPROX_HEADER_LENGTH;
   FSDataOutputStream out = null;
 --
 I had a test on TeraSort. A snippet from mapper's log is as follows:
 MapTask: Spilling map output: record full = true
 MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
 MapTask: kvstart = 262142; kvend = 131069; length = 655360
 MapTask: Finished spill 3
 In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
 52428700 (52 MB) because the number of spilled records is 524287 and each 
 record costs 100B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2013-01-26 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated MAPREDUCE-4882:
--

Attachment: MAPREDUCE-4882.patch

 Error in estimating the length of the output file in Spill Phase
 

 Key: MAPREDUCE-4882
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2, 1.0.3
 Environment: Any Environment
Reporter: Lijie Xu
Assignee: Jerry Chen
  Labels: patch
 Attachments: MAPREDUCE-4882.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The sortAndSpill() method in MapTask.java has an error in estimating the 
 length of the output file. 
 The long size should be (bufvoid - bufstart) + bufend not (bufvoid - 
 bufend) + bufstart when bufend  bufstart.
 Here is the original code in MapTask.java.
  private void sortAndSpill() throws IOException, ClassNotFoundException,
InterruptedException {
   //approximate the length of the output file to be the length of the
   //buffer + header lengths for the partitions
   long size = (bufend = bufstart
   ? bufend - bufstart
   : (bufvoid - bufend) + bufstart) +
   partitions * APPROX_HEADER_LENGTH;
   FSDataOutputStream out = null;
 --
 I had a test on TeraSort. A snippet from mapper's log is as follows:
 MapTask: Spilling map output: record full = true
 MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
 MapTask: kvstart = 262142; kvend = 131069; length = 655360
 MapTask: Finished spill 3
 In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
 52428700 (52 MB) because the number of spilled records is 524287 and each 
 record costs 100B.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2013-01-26 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated MAPREDUCE-4882:
--

Target Version/s: trunk  (was: 0.20.2, 1.0.3)
  Status: Patch Available  (was: Open)

Patch for fixing the problem attached.

Change from (bufvoid - bufend) + bufstart to (bufvoid - bufstart) + bufend 
and add test case for detecting invalid estimation size as for the case of 
bufend  bufstart, (bufvoid - bufend) + bufstart will greater than bufvoid.

Please kindly help review the patch.

 Error in estimating the length of the output file in Spill Phase
 

 Key: MAPREDUCE-4882
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.3, 0.20.2
 Environment: Any Environment
Reporter: Lijie Xu
Assignee: Jerry Chen
  Labels: patch
 Attachments: MAPREDUCE-4882.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The sortAndSpill() method in MapTask.java has an error in estimating the 
 length of the output file. 
 The long size should be (bufvoid - bufstart) + bufend not (bufvoid - 
 bufend) + bufstart when bufend  bufstart.
 Here is the original code in MapTask.java.
  private void sortAndSpill() throws IOException, ClassNotFoundException,
InterruptedException {
   //approximate the length of the output file to be the length of the
   //buffer + header lengths for the partitions
   long size = (bufend = bufstart
   ? bufend - bufstart
   : (bufvoid - bufend) + bufstart) +
   partitions * APPROX_HEADER_LENGTH;
   FSDataOutputStream out = null;
 --
 I had a test on TeraSort. A snippet from mapper's log is as follows:
 MapTask: Spilling map output: record full = true
 MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
 MapTask: kvstart = 262142; kvend = 131069; length = 655360
 MapTask: Finished spill 3
 In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
 52428700 (52 MB) because the number of spilled records is 524287 and each 
 record costs 100B.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira