[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513802#comment-14513802
 ] 

TezQA commented on TEZ-2358:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728348/TEZ-2358.3.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/540//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/540//console

This message is automatically generated.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch, 
 TEZ-2358.4.patch, syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 

[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513830#comment-14513830
 ] 

TezQA commented on TEZ-2358:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728354/TEZ-2358.4.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/541//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/541//console

This message is automatically generated.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch, 
 TEZ-2358.4.patch, syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 

[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514797#comment-14514797
 ] 

Gopal V commented on TEZ-2358:
--

[~hitesh]: marking this as blocker for 0.7.x, because it causes task failures 
for long running jobs.

[~rajesh.balamohan]: Patch LGTM - +1

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
Priority: Blocker
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch, 
 TEZ-2358.4.patch, syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_0, runDuration: 0]
 2015-04-23 03:28:08,270 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_1, runDuration: 0]
 2015-04-23 03:28:08,272 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 

[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513650#comment-14513650
 ] 

Gopal V commented on TEZ-2358:
--

The patch was again tested at 10Tb scale over the weekend - there seems to be 
no collisions in naming.

I look at the logs and noticed that some earlier tasks did succeed with the 
duplicate naming, due to the fact that there were only a few spills, resulting 
in them being split between the disks  not colliding in paths.

But for the sake of preventing future breakage, it would help to have an error 
being triggered when someone violates the no-duplicate rule for 
onDiskMapOutputs (i.e no two file chunks for merging can start at the same 
offset of the same file). 

My original pre-conditions were wrong when auto-reducer parallelism kicks in as 
we want to merge off a DISK_DIRECT input across two reducers (when auto-reduce 
parallelism kicks in), which would be different index points into the same 
DISK_DIRECT file.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, 
 syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_0, runDuration: 0]
 

[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513742#comment-14513742
 ] 

Gopal V commented on TEZ-2358:
--

[~rajesh.balamohan]: minor nit on the checkargument (not or not) pattern - it 
gets complex to add another condition later.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch, 
 syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_0, runDuration: 0]
 2015-04-23 03:28:08,270 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_1, runDuration: 0]
 2015-04-23 03:28:08,272 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 

[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510894#comment-14510894
 ] 

Gopal V commented on TEZ-2358:
--

[~rajesh.balamohan]: The patch is looking good in tests.

The ETL succeeded (Time taken: 14184.052 seconds) with this fix and there were 
no task failures due to full disks - also tested with local.fetch/shared.fetch 
turned on.

FYI, I added a rough precondition to throw an exception if a duplicate name was 
added to FileChunks, which pretty much gave me the confidence that this isn't 
hit anymore.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, 
 syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_0, runDuration: 0]
 2015-04-23 03:28:08,270 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_1, runDuration: 0]
 2015-04-23 03:28:08,272 INFO 
 

[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-24 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511227#comment-14511227
 ] 

TezQA commented on TEZ-2358:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12727924/TEZ-2358.2.patch
  against master revision 778c1f5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/529//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/529//console

This message is automatically generated.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, 
 syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]]