[jira] [Updated] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2358:
--
Attachment: TEZ-2358.3.patch

Added preconditions check in MergeManager.closeOnDiskFile().  Since we need to 
consider only filepath  offset, we need to iterate through all items in 
onDiskMapOutputs (as fileChunk includes filepath, offset, length). It is still 
fine as it won't be expensive and makes it easier for debugging.

[~gopalv] - Please have a look at the latest patch when you find time.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch, 
 syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_0, runDuration: 0]
 2015-04-23 03:28:08,270 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_1, runDuration: 0]
 2015-04-23 03:28:08,272 INFO 
 

Success: TEZ-2358 PreCommit Build #540

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2358
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/540/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2773 lines...]
[INFO] Final Memory: 73M/933M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728348/TEZ-2358.3.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/540//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/540//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
16b4bbd9396f2e0b5bd1dd21e0a5589578247c5b logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #537
Archived 44 artifacts
Archive block size is 32768
Received 6 blocks and 2576597 bytes
Compression is 7.1%
Took 1.4 sec
Description set: TEZ-2358
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Created] (TEZ-2370) Add stages information to RM UI for debugging / visibility on job progress

2015-04-27 Thread Hari Sekhon (JIRA)
Hari Sekhon created TEZ-2370:


 Summary: Add stages information to RM UI for debugging / 
visibility on job progress
 Key: TEZ-2370
 URL: https://issues.apache.org/jira/browse/TEZ-2370
 Project: Apache Tez
  Issue Type: Improvement
  Components: UI
Affects Versions: 0.5.2
 Environment: HDP 2.2.0
Reporter: Hari Sekhon
Priority: Minor


Something that has been bugging me since last year is the difficulty of 
debugging Tez jobs compared to MapReduce jobs.

This is because Resource Manager / Application Master does not display the job 
stats and stages that we are used to seeing in MapReduce eg. Map and Reduce 
task counts and progress. I appreciate that Tez is a more flexible framework 
with a DAG but it would be nice if it could surface the information on the 
different stages, number of tasks running, completed, failed, killed, 
successful etc, similar to how Spark does, and the stage breakdown would be 
useful in understanding what the job is doing at different times, what stage is 
getting stuck/failing etc.

At the moment the only thing available is to trawl the logs or hope to have a 
console output where some of that information is available, both of which are 
non-ideal when debugging other's people's jobs after the fact.

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513802#comment-14513802
 ] 

TezQA commented on TEZ-2358:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728348/TEZ-2358.3.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/540//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/540//console

This message is automatically generated.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch, 
 TEZ-2358.4.patch, syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 

Success: TEZ-2358 PreCommit Build #541

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2358
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/541/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2772 lines...]
[INFO] Final Memory: 76M/1274M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728354/TEZ-2358.4.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/541//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/541//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
6c74121472d38c3e18d73d3532e9348a11a7079a logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #540
Archived 44 artifacts
Archive block size is 32768
Received 6 blocks and 2550818 bytes
Compression is 7.2%
Took 1.3 sec
Description set: TEZ-2358
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513830#comment-14513830
 ] 

TezQA commented on TEZ-2358:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728354/TEZ-2358.4.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/541//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/541//console

This message is automatically generated.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch, 
 TEZ-2358.4.patch, syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 

[jira] [Commented] (TEZ-2303) ConcurrentModificationException while processing recovery

2015-04-27 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513894#comment-14513894
 ] 

Jeff Zhang commented on TEZ-2303:
-

[~hitesh] I didn't find way to stop accepting connections from client after DAG 
is recovered. Upload a another patch to use a different way.
* Register to RM after recovery is done so that client will get the host/port 
after the recovery is completed.
* There may be still one potential issue that if recovery fails, it would 
unregister to RM without register first, not sure whether this would cause any 
YarnException. 

 ConcurrentModificationException while processing recovery
 -

 Key: TEZ-2303
 URL: https://issues.apache.org/jira/browse/TEZ-2303
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jason Lowe
Assignee: Jeff Zhang
 Attachments: TEZ-2303-1.patch, TEZ-2303-2.patch, TEZ-2303-4.patch


 Saw a Tez AM log a few ConcurrentModificationException messages while trying 
 to recover from a previous attempt that crashed.  Exception details to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2372) TestAMRecovery failing in latest build

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514815#comment-14514815
 ] 

Hitesh Shah commented on TEZ-2372:
--

\cc [~zjffdu]

 TestAMRecovery failing in latest build 
 ---

 Key: TEZ-2372
 URL: https://issues.apache.org/jira/browse/TEZ-2372
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah

 https://builds.apache.org/job/Tez-Build/1018/console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2372) TestAMRecovery failing in latest build

2015-04-27 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-2372:


 Summary: TestAMRecovery failing in latest build 
 Key: TEZ-2372
 URL: https://issues.apache.org/jira/browse/TEZ-2372
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah


https://builds.apache.org/job/Tez-Build/1018/console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2363 PreCommit Build #550

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2363
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/550/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2770 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12727922/TEZ-2363.1.patch
  against master revision 21d4e2d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 161 javac 
compiler warnings (more than the master's current 160 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/550//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/550//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/550//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
4e398a5babc65b05d4c5d541ee8a9d840188d4b6 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #549
Archived 45 artifacts
Archive block size is 32768
Received 6 blocks and 2561573 bytes
Compression is 7.1%
Took 0.57 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

Failed: TEZ-993 PreCommit Build #552

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-993
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/552/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 20 lines...]
[PreCommit-TEZ-Build] $ /bin/bash /tmp/hudson5966714899514388262.sh
Running in Jenkins mode


==
==
Testing patch for TEZ-993.
==
==


HEAD is now at 21d4e2d TEZ-2342. TestFaultTolerance.testRandomFailingTasks 
fails due to timeout. (Jeff Zhang via hitesh)
error: pathspec 'master' did not match any file(s) known to git.
From https://git-wip-us.apache.org/repos/asf/tez
 * branchHEAD   - FETCH_HEAD
Current branch HEAD is up to date.
TEZ-993 patch is being downloaded at Mon Apr 27 19:47:57 UTC 2015 from
http://issues.apache.org/jira/secure/attachment/12695488/TEZ-993-5.patch
The patch does not appear to apply with p0 to p2
PATCH APPLICATION FAILED




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12695488/TEZ-993-5.patch
  against master revision 21d4e2d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/552//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
7f80fb7451e0d38bad82b83e672093ce2b7d989d logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (TEZ-993) Remove application logic from RecoveryService

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514828#comment-14514828
 ] 

TezQA commented on TEZ-993:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12695488/TEZ-993-5.patch
  against master revision 21d4e2d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/552//console

This message is automatically generated.

 Remove application logic from RecoveryService
 -

 Key: TEZ-993
 URL: https://issues.apache.org/jira/browse/TEZ-993
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jeff Zhang
 Attachments: TEZ-993-3.patch, TEZ-993-4.patch, TEZ-993-5.patch, 
 Tez-993-2.patch, Tez-993.patch


 Currently RecoveryService storage logic knows a lot about the DAG like which 
 dag is pre-warm and does not need to be stored, which events needs special 
 treatment etc. This kind of logic couples the DAG and the storage more than 
 is probably necessary and can be a source of complications down the road. The 
 storage should ideally be simply storing a sequence of arbitrary records 
 delimited by a marker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1019) Re-factor routing of events to use common code path for normal and recovery flow.

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514826#comment-14514826
 ] 

TezQA commented on TEZ-1019:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12697692/TEZ-1019-5.patch
  against master revision 21d4e2d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/551//console

This message is automatically generated.

 Re-factor routing of events to use common code path for normal and recovery 
 flow.
 -

 Key: TEZ-1019
 URL: https://issues.apache.org/jira/browse/TEZ-1019
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Jeff Zhang
 Attachments: TEZ-1019-2.patch, TEZ-1019-3.patch, TEZ-1019-4.patch, 
 TEZ-1019-5.patch, Tez-1019.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-1019 PreCommit Build #551

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-1019
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/551/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 20 lines...]
[PreCommit-TEZ-Build] $ /bin/bash /tmp/hudson7789757465151017088.sh
Running in Jenkins mode


==
==
Testing patch for TEZ-1019.
==
==


HEAD is now at 21d4e2d TEZ-2342. TestFaultTolerance.testRandomFailingTasks 
fails due to timeout. (Jeff Zhang via hitesh)
error: pathspec 'master' did not match any file(s) known to git.
From https://git-wip-us.apache.org/repos/asf/tez
 * branchHEAD   - FETCH_HEAD
Current branch HEAD is up to date.
TEZ-1019 patch is being downloaded at Mon Apr 27 19:47:50 UTC 2015 from
http://issues.apache.org/jira/secure/attachment/12697692/TEZ-1019-5.patch
The patch does not appear to apply with p0 to p2
PATCH APPLICATION FAILED




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12697692/TEZ-1019-5.patch
  against master revision 21d4e2d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/551//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
dcd65c91387ecf0b5d9971fa42f19824ecd6d36b logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514797#comment-14514797
 ] 

Gopal V commented on TEZ-2358:
--

[~hitesh]: marking this as blocker for 0.7.x, because it causes task failures 
for long running jobs.

[~rajesh.balamohan]: Patch LGTM - +1

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
Priority: Blocker
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch, 
 TEZ-2358.4.patch, syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_0, runDuration: 0]
 2015-04-23 03:28:08,270 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_1, runDuration: 0]
 2015-04-23 03:28:08,272 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 

Failed: TEZ-2259 PreCommit Build #547

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2259
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/547/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2770 lines...]

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728478/TEZ-2259.4.patch
  against master revision 21d4e2d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in :
 org.apache.tez.test.Tests
org.apache.tez.test.TestTests
org.apache.tez.teTests
org.apache.tez.test.TestDAGRecovery

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/547//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/547//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
11903cfaab5ad5323a5b4f7d0c03e59963a8e89e logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #543
Archived 44 artifacts
Archive block size is 32768
Received 2 blocks and 2705121 bytes
Compression is 2.4%
Took 2.2 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

Success: TEZ-2325 PreCommit Build #549

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2325
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/549/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2770 lines...]
[INFO] Final Memory: 77M/1181M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728394/TEZ-2325.4.patch
  against master revision 21d4e2d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/549//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/549//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
9bb668dafec1a33e18412c90b0a387f135d4eec6 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #548
Archived 44 artifacts
Archive block size is 32768
Received 0 blocks and 2752759 bytes
Compression is 0.0%
Took 0.59 sec
Description set: TEZ-2325
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2325) Route status update event directly to the attempt

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514914#comment-14514914
 ] 

TezQA commented on TEZ-2325:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728394/TEZ-2325.4.patch
  against master revision 21d4e2d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/549//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/549//console

This message is automatically generated.

 Route status update event directly to the attempt 
 --

 Key: TEZ-2325
 URL: https://issues.apache.org/jira/browse/TEZ-2325
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Prakash Ramachandran
 Attachments: TEZ-2325.1.patch, TEZ-2325.2.patch, TEZ-2325.3.patch, 
 TEZ-2325.4.patch


 Today, all events from the attempt heartbeat are routed to the vertex. then 
 the vertex routes (if any) status update events to the attempt. This is 
 unnecessary and potentially creates out of order scenarios. We could route 
 the status update events directly to attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2369) Add a few unit tests for RootInputInitializerManager

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514926#comment-14514926
 ] 

TezQA commented on TEZ-2369:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728484/TEZ-2369.2.txt
  against master revision 21d4e2d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/553//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/553//console

This message is automatically generated.

 Add a few unit tests for RootInputInitializerManager
 

 Key: TEZ-2369
 URL: https://issues.apache.org/jira/browse/TEZ-2369
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2369.1.txt, TEZ-2369.2.txt


 {code}
 -  Integer successfulAttempt = vertexSuccessfulAttemptMap.get(taskId);
 +  Integer successfulAttempt = 
 vertexSuccessfulAttemptMap.get(taskId.getId());
 {code}
 This could cause events to be sent multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-2358:
-
Priority: Blocker  (was: Major)

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
Priority: Blocker
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch, 
 TEZ-2358.4.patch, syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_0, runDuration: 0]
 2015-04-23 03:28:08,270 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_1, runDuration: 0]
 2015-04-23 03:28:08,272 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_2, runDuration: 0]
 {code}
 This will fail depending on 

[jira] [Commented] (TEZ-2259) Push additional data to Timeline for Recovery for better consumption in UI

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514964#comment-14514964
 ] 

Hitesh Shah commented on TEZ-2259:
--

Not sure what is off with the build. Console logs show a successful run:

{code}
INFO] tez ... SUCCESS [  1.048 s]
[INFO] tez-api ... SUCCESS [ 39.822 s]
[INFO] tez-common  SUCCESS [  3.114 s]
[INFO] tez-runtime-internals . SUCCESS [ 10.713 s]
[INFO] tez-runtime-library ... SUCCESS [01:13 min]
[INFO] tez-mapreduce . SUCCESS [ 25.778 s]
[INFO] tez-examples .. SUCCESS [  0.330 s]
[INFO] tez-dag ... SUCCESS [02:04 min]
[INFO] tez-tests . SUCCESS [24:57 min]
[INFO] tez-ui  SUCCESS [ 14.317 s]
[INFO] tez-plugins ... SUCCESS [  0.033 s]
[INFO] tez-yarn-timeline-history . SUCCESS [ 54.781 s]
[INFO] tez-yarn-timeline-history-with-acls ... SUCCESS [01:03 min]
[INFO] tez-mbeans-resource-calculator  SUCCESS [  0.970 s]
[INFO] tez-dist .. SUCCESS [ 12.728 s]
[INFO] Tez ... SUCCESS [  0.032 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 32:04 min
[INFO] Finished at: 2015-04-27T20:13:33+00:00
[INFO] Final Memory: 68M/814M
[INFO] 
{code}

{code}
[INFO] --- maven-surefire-plugin:2.14.1:test (default-test) @ tez-tests ---
[INFO] Surefire report directory: 
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build@2/tez-tests/target/surefire-reports

---
 T E S T S
---

---
 T E S TTests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
209.311 sec
Running org.apache.tez.test.Tests run: 2, FaTests run: 2, Failures: 0, Errors: 
0, Skipped: 0, Time elapsed: 91.521 sec
Running org.apache.tez.test.TestTests run: 22, FailurTests run: 2, Failures: 0, 
Errors: 0, Skipped: 0, Time elapsed: 143.722 sec
Running org.apache.tez.teTests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time 
elapsed: 79.795 sec
Running org.apache.tez.test.TestAMRecovery
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 267.144 sec
Running org.apache.tez.test.TestTezJobs
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 211.42 sec
Running org.apache.tez.test.TestDAGRecovery
ests run: 77, Failures: 0, Errors: 0, Skipped: 0
{code}
   - some munging of the output seems to exist. 

Ran tests locally and confirmed no failures. 



 Push additional data to Timeline for Recovery for better consumption in UI
 --

 Key: TEZ-2259
 URL: https://issues.apache.org/jira/browse/TEZ-2259
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-2259.1.patch, TEZ-2259.2.patch, TEZ-2259.3.patch, 
 TEZ-2259.4.patch


 Some things I can think of: 
  
- applicationAttemptId in which the dag was submitted
- appAttemptId in which the dag was completed 
 Above provides implicit information on how many app attempts the dag spanned 
 ( and therefore recovered how many times ).
   
- Maybe an implicit event mentioning that the DAG was recovered and in 
 which attempt it was recovered. Possibly add information on what state was 
 recovered?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2259) Push additional data to Timeline for Recovery for better consumption in UI

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2259:
-
Attachment: TEZ-2259.branch-06.patch

branch 0.6 patch as master patch conflicts. 

 Push additional data to Timeline for Recovery for better consumption in UI
 --

 Key: TEZ-2259
 URL: https://issues.apache.org/jira/browse/TEZ-2259
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-2259.1.patch, TEZ-2259.2.patch, TEZ-2259.3.patch, 
 TEZ-2259.4.patch, TEZ-2259.branch-06.patch


 Some things I can think of: 
  
- applicationAttemptId in which the dag was submitted
- appAttemptId in which the dag was completed 
 Above provides implicit information on how many app attempts the dag spanned 
 ( and therefore recovered how many times ).
   
- Maybe an implicit event mentioning that the DAG was recovered and in 
 which attempt it was recovered. Possibly add information on what state was 
 recovered?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-04-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515040#comment-14515040
 ] 

Bikas Saha commented on TEZ-391:


[~zjffdu] Can you make a call on whether this is for 0.7.0 or not?
IMO, if this was close to being done then perhaps yes.

 SharedEdge - Support for passing same output from a vertex as input to two 
 different vertices
 -

 Key: TEZ-391
 URL: https://issues.apache.org/jira/browse/TEZ-391
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy
Assignee: Jeff Zhang
 Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
 TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
 TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch


   We need this for lot of usecases. For cases where multi-query is turned off 
 and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
 we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2369 PreCommit Build #553

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2369
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/553/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2768 lines...]
[INFO] Final Memory: 69M/917M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728484/TEZ-2369.2.txt
  against master revision 21d4e2d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/553//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/553//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
9cc787fd89de51b13fc38b77319d6f7994b14111 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #549
Archived 44 artifacts
Archive block size is 32768
Received 6 blocks and 2586172 bytes
Compression is 7.1%
Took 0.6 sec
Description set: TEZ-2369
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2358:
--
Attachment: TEZ-2358.4.patch

Sure, addressing review comments in the latest patch.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch, 
 TEZ-2358.4.patch, syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_0, runDuration: 0]
 2015-04-23 03:28:08,270 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_1, runDuration: 0]
 2015-04-23 03:28:08,272 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_2, runDuration: 

Failed: TEZ-1752 PreCommit Build #539

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-1752
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/539/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 1991 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728349/TEZ-1752.2.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.runtime.task.TestTaskExecution

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/539//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/539//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
5b797557a09dfb5b9d9eb69d330cf57cdea75535 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #537
Archived 44 artifacts
Archive block size is 32768
Received 6 blocks and 2493908 bytes
Compression is 7.3%
Took 1.4 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
2 tests failed.
REGRESSION:  
org.apache.tez.runtime.task.TestTaskExecution.testHeartbeatShouldDie

Error Message:
test timed out after 5000 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 5000 milliseconds
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
at java.util.concurrent.FutureTask.get(FutureTask.java:187)
at 
org.apache.tez.runtime.task.TestTaskExecution.testHeartbeatShouldDie(TestTaskExecution.java:317)


REGRESSION:  
org.apache.tez.runtime.task.TestTaskExecution.testHeartbeatException

Error Message:
test timed out after 5000 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 5000 milliseconds
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
at java.util.concurrent.FutureTask.get(FutureTask.java:187)
at 
org.apache.tez.runtime.task.TestTaskExecution.testHeartbeatException(TestTaskExecution.java:278)




[jira] [Commented] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513760#comment-14513760
 ] 

TezQA commented on TEZ-1752:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728349/TEZ-1752.2.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.runtime.task.TestTaskExecution

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/539//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/539//console

This message is automatically generated.

 Inputs / Outputs in the Runtime library should be interruptable
 ---

 Key: TEZ-1752
 URL: https://issues.apache.org/jira/browse/TEZ-1752
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
 Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch


 Not possible to preempt tasks without killing containers without this.
 There's still the problem of Processors not supporting interrupts. We may 
 need API enhancements to either query IPOs on whether they are interrupbtible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2342) TestFaultTolerance.testRandomFailingTasks fails due to timeout

2015-04-27 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513676#comment-14513676
 ] 

Jeff Zhang commented on TEZ-2342:
-

[~bikassaha] No other issue after running many times, and check the logs on the 
windows jenkins server, it is failed due to timeout.



 TestFaultTolerance.testRandomFailingTasks fails due to timeout
 --

 Key: TEZ-2342
 URL: https://issues.apache.org/jira/browse/TEZ-2342
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor
 Attachments: TEZ-2342-1.patch, syslog_dag_1429582868137_0001_1


 {code}
 Error Message
 test timed out after 12 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 12 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:126)
   at 
 org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114)
   at 
 org.apache.tez.test.TestFaultTolerance.testRandomFailingTasks(TestFaultTolerance.java:723)
 Standard Output
 2015-04-17 07:46:10,952 INFO  [main] test.TestFaultTolerance 
 (TestFaultTolerance.java:setup(65)) - Starting mini clusters
 2015-04-17 07:46:11,508 INFO  [main] hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:init(446)) - starting cluster: numNameNodes=1, 
 numDataNodes=1
 Formatting using clusterid: testClusterID
 2015-04-17 07:46:12,919 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(716)) - No KeyProvider found.
 2015-04-17 07:46:12,920 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(726)) - fsLock is fair:true
 2015-04-17 07:46:13,021 INFO  [main] Configuration.deprecation 
 (Configuration.java:warnOnceIfDeprecated(1173)) - 
 hadoop.configured.node.mapping is deprecated. Instead, use 
 net.topology.configured.node.mapping
 2015-04-17 07:46:13,021 INFO  [main] blockmanagement.DatanodeManager 
 (DatanodeManager.java:init(239)) - dfs.block.invalidate.limit=1000
 2015-04-17 07:46:13,022 INFO  [main] blockmanagement.DatanodeManager 
 (DatanodeManager.java:init(245)) - 
 dfs.namenode.datanode.registration.ip-hostname-check=true
 2015-04-17 07:46:13,022 INFO  [main] blockmanagement.BlockManager 
 (InvalidateBlocks.java:printBlockDeletionTime(71)) - 
 dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
 2015-04-17 07:46:13,025 INFO  [main] blockmanagement.BlockManager 
 (InvalidateBlocks.java:printBlockDeletionTime(76)) - The block deletion will 
 start around 2015 Apr 17 07:46:13
 2015-04-17 07:46:13,029 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(354)) - Computing capacity for map 
 BlocksMap
 2015-04-17 07:46:13,030 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(355)) - VM type   = 64-bit
 2015-04-17 07:46:13,032 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(356)) - 2.0% max memory 910.3 MB = 18.2 
 MB
 2015-04-17 07:46:13,033 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(361)) - capacity  = 2^21 = 2097152 
 entries
 2015-04-17 07:46:13,079 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:createBlockTokenSecretManager(365)) - 
 dfs.block.access.token.enable=false
 2015-04-17 07:46:13,080 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(350)) - defaultReplication = 1
 2015-04-17 07:46:13,080 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(351)) - maxReplication = 512
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(352)) - minReplication = 1
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(353)) - maxReplicationStreams  = 2
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(354)) - shouldCheckForEnoughRacks  = false
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(355)) - replicationRecheckInterval = 3000
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(356)) - encryptDataTransfer= false
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(357)) - maxNumBlocksToLog  = 1000
 2015-04-17 07:46:13,115 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(746)) - fsOwner = jenkins (auth:SIMPLE)
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(747)) - supergroup  = supergroup
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(748)) - isPermissionEnabled = true
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 

[jira] [Updated] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable

2015-04-27 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1752:
--
Attachment: TEZ-1752.2.patch



In LogicalIOProcessorRuntimeTask - it would be useful to log the interrupt 
status between each close invocation, and potentially set it if the I/O/P being 
closed ends up unsetting it.
- Done

cleanup would behave differently if initialize hasn't been invoked. We may need 
to track which I O Ps have been initialized - and close just those - In the 
MergeManager, InterruptedException thrown by MergeThraed.close likely needs to 
be handled (otherwise it'll end up skipping cleanup?)
- Done

In the invocation of finalMerge - an IOException is caught, are there specific 
cases here where this IO exception is actually masking an interrupt ? (and as a 
result the interrupt status needs to be set)
- Done

The TezMerger change - should we just change the interface to throw 
InterruptedException, instead of setting the flag. That's a private method, and 
will force consumers within the IOs to handle it.
- Modified TezMerger to throw InterruptedException

UnorderedPartitionedKVWriter / others - in the close method, instead of 
returning an empty event list - should this just throw an InterruptedException 
back ?
- Done

Is the change in the TaskReporter required ? taskFailed shouldn't be invoked 
after the currentTask has been unregistered.
- No, added that since the spurious logs (NPE) were coming up which made it 
difficult to debug.  Master already has the fix for it. Removed the changes in 
the patch.

We likely need to ensure that cleanup / close methods aren't called twice - 
once during regular cleanup, second during an interrupt while the cleanup is in 
progress.
- Tracking the close() of IPO. This would take care of not making the call 
twice.

Not directly related to interrupts - but an invocation on Task.close() (regular 
flow) can cause exceptions during Processor close or Input / Output close - 
which would prevent subsequent Inputs / Outputs from being closed.Do we need to 
make sure that close() gets invoked on subsequent Inputs / Outputs despite a 
prior exception ?
- Yes, this is needed. Tracking the IPO close() and task.cleanup() in the patch 
takes care of this.

 Inputs / Outputs in the Runtime library should be interruptable
 ---

 Key: TEZ-1752
 URL: https://issues.apache.org/jira/browse/TEZ-1752
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
 Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch


 Not possible to preempt tasks without killing containers without this.
 There's still the problem of Processors not supporting interrupts. We may 
 need API enhancements to either query IPOs on whether they are interrupbtible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2226) Disable writing history to timeline if domain creation fails.

2015-04-27 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513602#comment-14513602
 ] 

Jeff Zhang commented on TEZ-2226:
-

I saw the TEZ_DAG_HISTORY_LOGGING is set in the dag's configuration. So it 
should be able to restore this value when recovering. 
[~lichangleo] I think you need to update the following code in 
RecoveryParser.java when recovering from DAGSubmittedEvent. (Also update the 
skippedDAGs of ATSHistoryLoggingService in this place)

{code}
case DAG_SUBMITTED:
  {
DAGSubmittedEvent submittedEvent = (DAGSubmittedEvent) event;
LOG.info(Recovering from event
+ , eventType= + eventType
+ , event= + event.toString());
recoveredDAGData.recoveredDAG = 
dagAppMaster.createDAG(submittedEvent.getDAGPlan(),
lastInProgressDAG);
recoveredDAGData.cumulativeAdditionalResources = submittedEvent
  .getCumulativeAdditionalLocalResources();
recoveredDAGData.recoveredDagID = 
recoveredDAGData.recoveredDAG.getID();
dagAppMaster.setCurrentDAG(recoveredDAGData.recoveredDAG);
if (recoveredDAGData.nonRecoverable) {
  skipAllOtherEvents = true;
}
break;
{code}


BTW there's no apache header for HistoryACLPolicyException.java

 Disable writing history to timeline if domain creation fails.
 -

 Key: TEZ-2226
 URL: https://issues.apache.org/jira/browse/TEZ-2226
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Chang Li
Priority: Blocker
 Attachments: TEZ-2226.10.patch, TEZ-2226.2.patch, TEZ-2226.3.patch, 
 TEZ-2226.4.patch, TEZ-2226.5.patch, TEZ-2226.6.patch, TEZ-2226.7.patch, 
 TEZ-2226.8.patch, TEZ-2226.9.patch, TEZ-2226.patch, TEZ-2226.wip.2.patch, 
 TEZ-2226.wip.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513650#comment-14513650
 ] 

Gopal V commented on TEZ-2358:
--

The patch was again tested at 10Tb scale over the weekend - there seems to be 
no collisions in naming.

I look at the logs and noticed that some earlier tasks did succeed with the 
duplicate naming, due to the fact that there were only a few spills, resulting 
in them being split between the disks  not colliding in paths.

But for the sake of preventing future breakage, it would help to have an error 
being triggered when someone violates the no-duplicate rule for 
onDiskMapOutputs (i.e no two file chunks for merging can start at the same 
offset of the same file). 

My original pre-conditions were wrong when auto-reducer parallelism kicks in as 
we want to merge off a DISK_DIRECT input across two reducers (when auto-reduce 
parallelism kicks in), which would be different index points into the same 
DISK_DIRECT file.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, 
 syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_0, runDuration: 0]
 

[jira] [Commented] (TEZ-2358) Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task

2015-04-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513742#comment-14513742
 ] 

Gopal V commented on TEZ-2358:
--

[~rajesh.balamohan]: minor nit on the checkargument (not or not) pattern - it 
gets complex to add another condition later.

 Pipelined Shuffle: MergeManager assumptions about 1 merge per source-task
 -

 Key: TEZ-2358
 URL: https://issues.apache.org/jira/browse/TEZ-2358
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Rajesh Balamohan
 Attachments: TEZ-2358.1.patch, TEZ-2358.2.patch, TEZ-2358.3.patch, 
 syslog_attempt_1429683757595_0141_1_01_000143_0.syslog.bz2


 The Tez MergeManager code assumes that the src-task-id is unique between 
 merge operations, this results in some confusion when two merge sequences 
 have to process output from the same src-task-id.
 {code}
 private TezRawKeyValueIterator finalMerge(Configuration job, FileSystem fs,
ListMapOutput inMemoryMapOutputs,
ListFileChunk onDiskMapOutputs
 ...
  if (inMemoryMapOutputs.size()  0) {
   int srcTaskId = 
 inMemoryMapOutputs.get(0).getAttemptIdentifier().getInputIdentifier().getInputIndex();
 ...
// must spill to disk, but can't retain in-mem for intermediate merge
 final Path outputPath = 
   mapOutputFile.getInputFileForWrite(srcTaskId,
  inMemToDiskBytes).suffix(
  
 Constants.MERGED_OUTPUT_PREFIX);
 ...
 {code}
 This or some scenario related to this, results in the following FileChunks 
 list which contains identical named paths with different lengths.
 {code}
 2015-04-23 03:28:50,983 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: Initiating in-memory merge with 6 segments...
 2015-04-23 03:28:50,987 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: 
 Merging 6 sorted segments
 2015-04-23 03:28:50,988 INFO [MemtoDiskMerger [Map_1]] impl.TezMerger: Down 
 to the last merge-pass, with 6 segments left of total size: 1165944755 bytes
 2015-04-23 03:28:58,495 INFO [MemtoDiskMerger [Map_1]] 
 orderedgrouped.MergeManager: attempt_1429683757595_0141_1_01_000143_0_10027 
 Merge of the 6 files in-memory complete. Local file is 
 /grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out.merged
  of size 785583965
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: finalMerge called with 0 in-memory map-outputs 
 and 5 on-disk map-outputs
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 365232290 += 
 365232290for/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_1023.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 730529899 += 
 365297609for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 2015-04-23 03:28:58,496 INFO [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.MergeManager: GOPAL: onDiskBytes = 1095828683 += 
 365298784for/grid/5/cluster/yarn/local/usercache/gopal/appcache/application_1429683757595_0141/attempt_1429683757595_0141_1_01_000143_0_10027_spill_404.out
 {code}
 The multiple instances of 404.out indicates that we pulled two pipelined 
 chunks of the same shuffle src id, once into memory and twice onto disk.
 {code}
 2015-04-23 03:28:08,256 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_0, runDuration: 0]
 2015-04-23 03:28:08,270 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 attempt_1429683757595_0141_1_00_000404_0_10009_1, runDuration: 0]
 2015-04-23 03:28:08,272 INFO 
 [TezTaskEventRouter[attempt_1429683757595_0141_1_01_000143_0]] 
 orderedgrouped.ShuffleInputEventHandlerOrderedGrouped: DME srcIdx: 143, 
 targetIdx: 404, attemptNum: 0, payload: [hasEmptyPartitions: true, host: 
 cn047-10.l42scl.hortonworks.com, port: 13562, pathComponent: 
 

[jira] [Updated] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable

2015-04-27 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1752:
--
Attachment: TEZ-1752.3.patch

 Inputs / Outputs in the Runtime library should be interruptable
 ---

 Key: TEZ-1752
 URL: https://issues.apache.org/jira/browse/TEZ-1752
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
 Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch


 Not possible to preempt tasks without killing containers without this.
 There's still the problem of Processors not supporting interrupts. We may 
 need API enhancements to either query IPOs on whether they are interrupbtible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2360 PreCommit Build #544

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2360
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/544/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2768 lines...]


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728386/TEZ-2360.1.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/544//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/544//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/544//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
fb6f12d03e3f1a432f3d77442a57fcf1482f7f7d logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #543
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2626408 bytes
Compression is 4.8%
Took 0.78 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2325) Route status update event directly to the attempt

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514125#comment-14514125
 ] 

TezQA commented on TEZ-2325:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728394/TEZ-2325.4.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/545//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/545//console

This message is automatically generated.

 Route status update event directly to the attempt 
 --

 Key: TEZ-2325
 URL: https://issues.apache.org/jira/browse/TEZ-2325
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Prakash Ramachandran
 Attachments: TEZ-2325.1.patch, TEZ-2325.2.patch, TEZ-2325.3.patch, 
 TEZ-2325.4.patch


 Today, all events from the attempt heartbeat are routed to the vertex. then 
 the vertex routes (if any) status update events to the attempt. This is 
 unnecessary and potentially creates out of order scenarios. We could route 
 the status update events directly to attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-1752 PreCommit Build #543

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-1752
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/543/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2785 lines...]
[INFO] Final Memory: 72M/888M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728382/TEZ-1752.3.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/543//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/543//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
f834cc9c90f68fafaf5c4cf27d0ae42da5c03d06 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #542
Archived 44 artifacts
Archive block size is 32768
Received 0 blocks and 2750825 bytes
Compression is 0.0%
Took 0.6 sec
Description set: TEZ-1752
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514104#comment-14514104
 ] 

TezQA commented on TEZ-2360:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728386/TEZ-2360.1.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/544//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/544//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/544//console

This message is automatically generated.

 per-io counters flag should generate both overall and per-edge counters 
 

 Key: TEZ-2360
 URL: https://issues.apache.org/jira/browse/TEZ-2360
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Prakash Ramachandran
 Attachments: TEZ-2360.1.patch


 Currently, the per-io flag disables overall per task counters and retains 
 only per edge counters. It would be useful to have both overall and per edge 
 counters. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2371) Upgrade hive branch to latest Tez

2015-04-27 Thread Gopal V (JIRA)
Gopal V created TEZ-2371:


 Summary: Upgrade hive branch to latest Tez
 Key: TEZ-2371
 URL: https://issues.apache.org/jira/browse/TEZ-2371
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V
Assignee: Gopal V


Upgrade hive to the upcoming tez-0.7 release 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2325) Route status update event directly to the attempt

2015-04-27 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2325:
--
Attachment: TEZ-2325.4.patch

 Route status update event directly to the attempt 
 --

 Key: TEZ-2325
 URL: https://issues.apache.org/jira/browse/TEZ-2325
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Prakash Ramachandran
 Attachments: TEZ-2325.1.patch, TEZ-2325.2.patch, TEZ-2325.3.patch, 
 TEZ-2325.4.patch


 Today, all events from the attempt heartbeat are routed to the vertex. then 
 the vertex routes (if any) status update events to the attempt. This is 
 unnecessary and potentially creates out of order scenarios. We could route 
 the status update events directly to attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2303 PreCommit Build #542

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2303
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/542/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2776 lines...]
[INFO] Final Memory: 69M/924M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728374/TEZ-2303-4.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/542//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/542//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
6f0cdddef6804ccd72a9b7336bfc0ab1be9c0ab0 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #541
Archived 44 artifacts
Archive block size is 32768
Received 2 blocks and 2754872 bytes
Compression is 2.3%
Took 1.5 sec
Description set: TEZ-2303
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2303) ConcurrentModificationException while processing recovery

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514082#comment-14514082
 ] 

TezQA commented on TEZ-2303:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728374/TEZ-2303-4.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/542//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/542//console

This message is automatically generated.

 ConcurrentModificationException while processing recovery
 -

 Key: TEZ-2303
 URL: https://issues.apache.org/jira/browse/TEZ-2303
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jason Lowe
Assignee: Jeff Zhang
 Attachments: TEZ-2303-1.patch, TEZ-2303-2.patch, TEZ-2303-4.patch


 Saw a Tez AM log a few ConcurrentModificationException messages while trying 
 to recover from a previous attempt that crashed.  Exception details to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514089#comment-14514089
 ] 

TezQA commented on TEZ-1752:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728382/TEZ-1752.3.patch
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/543//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/543//console

This message is automatically generated.

 Inputs / Outputs in the Runtime library should be interruptable
 ---

 Key: TEZ-1752
 URL: https://issues.apache.org/jira/browse/TEZ-1752
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
 Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch


 Not possible to preempt tasks without killing containers without this.
 There's still the problem of Processors not supporting interrupts. We may 
 need API enhancements to either query IPOs on whether they are interrupbtible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters

2015-04-27 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2360:
--
Attachment: TEZ-2360.1.patch

 per-io counters flag should generate both overall and per-edge counters 
 

 Key: TEZ-2360
 URL: https://issues.apache.org/jira/browse/TEZ-2360
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Prakash Ramachandran
 Attachments: TEZ-2360.1.patch


 Currently, the per-io flag disables overall per task counters and retains 
 only per edge counters. It would be useful to have both overall and per edge 
 counters. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters

2015-04-27 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2360:
--
Attachment: TEZ-2360.2.patch

Fixed findbug warnings.

 per-io counters flag should generate both overall and per-edge counters 
 

 Key: TEZ-2360
 URL: https://issues.apache.org/jira/browse/TEZ-2360
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Prakash Ramachandran
 Attachments: TEZ-2360.1.patch, TEZ-2360.2.patch


 Currently, the per-io flag disables overall per task counters and retains 
 only per edge counters. It would be useful to have both overall and per edge 
 counters. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2365) Update tez-ui war's license/notice to reflect OFL license correctly

2015-04-27 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514226#comment-14514226
 ] 

Prakash Ramachandran commented on TEZ-2365:
---

+1 LGTM.

 Update tez-ui war's license/notice to reflect OFL license correctly 
 

 Key: TEZ-2365
 URL: https://issues.apache.org/jira/browse/TEZ-2365
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-2365.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-04-27 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2366:
--
Attachment: TEZ-2366.wip.1.patch

[~sseth] attaching a patch which checks the port along with the host. one quick 
question though. the mapreduce.shuffle.port is not exposed by yarn. is it fine 
to rely on that conf and its default value? if the patch looks ok. i can add 
the tests.

 Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
 

 Key: TEZ-2366
 URL: https://issues.apache.org/jira/browse/TEZ-2366
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Priority: Critical
 Attachments: TEZ-2366.test.txt, TEZ-2366.wip.1.patch


 There are around 20 unit tests (out of around 2000) fail intermittently after 
 TEZ-2333. Here is a stack:
 {code}
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
 output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
 of the configured local directories
 at 
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
 at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 To reproduce that in Pig test, using the following commands:
 svn co http://svn.apache.org/repos/asf/pig/trunk
 ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
 test
 Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
 true 
 (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to false in Pig and does 
 not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2314) Tez task attempt failures due to bad event serialization

2015-04-27 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516076#comment-14516076
 ] 

Siddharth Seth commented on TEZ-2314:
-

bq. Updates to volatile longs are atomic according to the Java language 
specification.
That's good to know.

bq. This is unrelated to the actual contents of the stats etc. This is more 
around having the right number of objects in the hearbeat request. There should 
be N stats objects for N IOs. So that code upstream (serde or non-serde) can 
simply work on the correct number of objects. About consistency of the objects 
internal state while updates are in progress, those will have to be looked at 
as needed.
Already said this was OK to go in for now. It does however have issues when 
stats are added dynamically - which we will hit at a later point when this is 
supported. There's no relation to the code upstream requiring N objects, since 
we handle the absence of stats correctly. One input initialized - reports some 
stats - which may or may not show up in the AM. Another one blocked on 
initialization, we don't report stats.

 Tez task attempt failures due to bad event serialization
 

 Key: TEZ-2314
 URL: https://issues.apache.org/jira/browse/TEZ-2314
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-2314.1.patch, TEZ-2314.log.patch


 {code}
 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: 
 Unable to read call parameters for client 10.216.13.112on connection protocol 
 org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE
 java.lang.ArrayIndexOutOfBoundsException: 1935896432
 at 
 org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120)
 at 
 org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271)
 at 
 org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110)
 at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160)
 at 
 org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884)
 at 
 org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806)
 at 
 org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673)
 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644)
 {code}
 cc/ [~hitesh] and [~bikassaha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2373) Whitespace cleanup in tez codebase

2015-04-27 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-2373:


 Summary: Whitespace cleanup in tez codebase 
 Key: TEZ-2373
 URL: https://issues.apache.org/jira/browse/TEZ-2373
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Trivial


Found only 480 out of 790 java files need a cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2314) Tez task attempt failures due to bad event serialization

2015-04-27 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515837#comment-14515837
 ] 

Siddharth Seth commented on TEZ-2314:
-

The plugin where the stats are eventually used - the VMPlugin I believe. Looks 
like that path is handled via null checks while accumulating statistics.
One thing I did notice though, is that TaskAttempt.getStatistics is outside any 
lock - can be fixed here or a spearate jira since it's not related directly to 
the issue.

On the patch itself - volatile long instead of synchronizing the updates to the 
values can be problematic - since operations on longs are not atomic.
The approach of sending the data only after initialization is fine for now. 
We'll have to keep this in mind when adding user specified statistics, or stats 
which are not setup during initialization. Synchronization is a simpler 
approach though, and won't run into these potential pitfalls later.

 Tez task attempt failures due to bad event serialization
 

 Key: TEZ-2314
 URL: https://issues.apache.org/jira/browse/TEZ-2314
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-2314.1.patch, TEZ-2314.log.patch


 {code}
 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: 
 Unable to read call parameters for client 10.216.13.112on connection protocol 
 org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE
 java.lang.ArrayIndexOutOfBoundsException: 1935896432
 at 
 org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120)
 at 
 org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271)
 at 
 org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110)
 at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160)
 at 
 org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884)
 at 
 org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806)
 at 
 org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673)
 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644)
 {code}
 cc/ [~hitesh] and [~bikassaha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2226) Disable writing history to timeline if domain creation fails.

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2226:
-
Attachment: TEZ-2226.12.patch

Renamed combined patch to patch 12. 

 Disable writing history to timeline if domain creation fails.
 -

 Key: TEZ-2226
 URL: https://issues.apache.org/jira/browse/TEZ-2226
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Chang Li
Priority: Blocker
 Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.12.patch, 
 TEZ-2226.2.patch, TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, 
 TEZ-2226.6.patch, TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, 
 TEZ-2226.addon-for-patch10, TEZ-2226.addon-for-patch10-combined.full.patch, 
 TEZ-2226.patch, TEZ-2226.wip.2.patch, TEZ-2226.wip.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2226 PreCommit Build #555

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2226
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/555/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2780 lines...]
[INFO] Final Memory: 72M/929M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  
http://issues.apache.org/jira/secure/attachment/12728523/TEZ-2226.addon-for-patch10-combined.full.patch
  against master revision aa87a14.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/555//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/555//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
1c99c398084a0d570542251b364b522bae05bb99 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #554
Archived 44 artifacts
Archive block size is 32768
Received 8 blocks and 2491263 bytes
Compression is 9.5%
Took 1.1 sec
Description set: TEZ-2226
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2226) Disable writing history to timeline if domain creation fails.

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515843#comment-14515843
 ] 

TezQA commented on TEZ-2226:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  
http://issues.apache.org/jira/secure/attachment/12728523/TEZ-2226.addon-for-patch10-combined.full.patch
  against master revision aa87a14.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/555//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/555//console

This message is automatically generated.

 Disable writing history to timeline if domain creation fails.
 -

 Key: TEZ-2226
 URL: https://issues.apache.org/jira/browse/TEZ-2226
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Chang Li
Priority: Blocker
 Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.12.patch, 
 TEZ-2226.2.patch, TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, 
 TEZ-2226.6.patch, TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, 
 TEZ-2226.addon-for-patch10, TEZ-2226.addon-for-patch10-combined.full.patch, 
 TEZ-2226.patch, TEZ-2226.wip.2.patch, TEZ-2226.wip.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2374) Fix build break against hadoop-2.2 due to TEZ-2325

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2374:
-
Attachment: TEZ-2374.1.patch

 Fix build break against hadoop-2.2 due to TEZ-2325
 --

 Key: TEZ-2374
 URL: https://issues.apache.org/jira/browse/TEZ-2374
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-2374.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2226) Disable writing history to timeline if domain creation fails.

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516003#comment-14516003
 ] 

TezQA commented on TEZ-2226:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728605/TEZ-2226.12.patch
  against master revision 9e9cf99.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/556//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/556//console

This message is automatically generated.

 Disable writing history to timeline if domain creation fails.
 -

 Key: TEZ-2226
 URL: https://issues.apache.org/jira/browse/TEZ-2226
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Chang Li
Priority: Blocker
 Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.12.patch, 
 TEZ-2226.2.patch, TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, 
 TEZ-2226.6.patch, TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, 
 TEZ-2226.addon-for-patch10, TEZ-2226.addon-for-patch10-combined.full.patch, 
 TEZ-2226.patch, TEZ-2226.wip.2.patch, TEZ-2226.wip.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster

2015-04-27 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2359:

Priority: Blocker  (was: Critical)

 Deadlock in DAGAppMaster
 

 Key: TEZ-2359
 URL: https://issues.apache.org/jira/browse/TEZ-2359
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Priority: Blocker

 {code}
 Found one Java-level deadlock:
 =
 Timer-1:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Dispatcher thread: Central:
   waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService),
   which is held by DelayedContainerManager
 DelayedContainerManager:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Java stack information for the threads listed above:
 ===
 Timer-1:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015)
   - locked 0x0007cd0f2ff0 (a org.apache.tez.dag.app.DAGAppMaster)
   at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 Dispatcher thread: Central:
   at 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842)
   - waiting to lock 0x0007cd5ab958 (a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
   at 
 org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832)
   at 
 org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   - locked 0x0007cd1d0208 (a 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
   at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
   at java.lang.Thread.run(Thread.java:745)
 DelayedContainerManager:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
   at org.apache.tez.dag.app.DAGAppMaster.getState(DAGAppMaster.java:531)
   at 
 

[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster

2015-04-27 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2359:

Target Version/s: 0.7.0

 Deadlock in DAGAppMaster
 

 Key: TEZ-2359
 URL: https://issues.apache.org/jira/browse/TEZ-2359
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Priority: Blocker

 {code}
 Found one Java-level deadlock:
 =
 Timer-1:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Dispatcher thread: Central:
   waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService),
   which is held by DelayedContainerManager
 DelayedContainerManager:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Java stack information for the threads listed above:
 ===
 Timer-1:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015)
   - locked 0x0007cd0f2ff0 (a org.apache.tez.dag.app.DAGAppMaster)
   at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 Dispatcher thread: Central:
   at 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842)
   - waiting to lock 0x0007cd5ab958 (a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
   at 
 org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832)
   at 
 org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   - locked 0x0007cd1d0208 (a 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
   at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
   at java.lang.Thread.run(Thread.java:745)
 DelayedContainerManager:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
   at org.apache.tez.dag.app.DAGAppMaster.getState(DAGAppMaster.java:531)
   at 
 

[jira] [Commented] (TEZ-2226) Disable writing history to timeline if domain creation fails.

2015-04-27 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515120#comment-14515120
 ] 

Chang Li commented on TEZ-2226:
---

Thanks a lot for help [~zjffdu], [~hitesh]! I updated my latest patch to handle 
the am crash and recover scenario, have tested in my single node cluster. Could 
you please help review, thanks!

 Disable writing history to timeline if domain creation fails.
 -

 Key: TEZ-2226
 URL: https://issues.apache.org/jira/browse/TEZ-2226
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Chang Li
Priority: Blocker
 Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.2.patch, 
 TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, TEZ-2226.6.patch, 
 TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, TEZ-2226.patch, 
 TEZ-2226.wip.2.patch, TEZ-2226.wip.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2368) Make the dag number available in Context classes

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515150#comment-14515150
 ] 

Hitesh Shah commented on TEZ-2368:
--

Comments: typo in Get a numeric identifier for the dto which the task belongs 

+1 once the typo is fixed. 

 Make the dag number available in Context classes
 

 Key: TEZ-2368
 URL: https://issues.apache.org/jira/browse/TEZ-2368
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2368.1.txt, TEZ-2368.2.txt


 Provide the dag number, which is a unique number, for each dag running within 
 an application in the TezInputContext, TezOutputContext, TezProcessorContext.
 When containers are re-used, or for external services, this can be used to 
 generate intermediate data to a dag specific directory instead of an 
 application specific directory, where it becomes difficult to differentiate 
 between different dags.
 The DAG name does provide this - but is not suitable for use in a directory 
 name. Hashing the name is an option, but can lead to collisions.
 Generating data into a dag specific directory will eventually only be usable 
 when we move away from the default MR handler, or enhance it to support an 
 additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2325) Route status update event directly to the attempt

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515160#comment-14515160
 ] 

Hitesh Shah commented on TEZ-2325:
--

Committing shortly. 

 Route status update event directly to the attempt 
 --

 Key: TEZ-2325
 URL: https://issues.apache.org/jira/browse/TEZ-2325
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Prakash Ramachandran
 Attachments: TEZ-2325.1.patch, TEZ-2325.2.patch, TEZ-2325.3.patch, 
 TEZ-2325.4.patch


 Today, all events from the attempt heartbeat are routed to the vertex. then 
 the vertex routes (if any) status update events to the attempt. This is 
 unnecessary and potentially creates out of order scenarios. We could route 
 the status update events directly to attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2368) Make the dag number available in Context classes

2015-04-27 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2368:

Attachment: TEZ-2368.3.txt

Fixed the typo. Thanks for the review. Committing.

 Make the dag number available in Context classes
 

 Key: TEZ-2368
 URL: https://issues.apache.org/jira/browse/TEZ-2368
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2368.1.txt, TEZ-2368.2.txt, TEZ-2368.3.txt


 Provide the dag number, which is a unique number, for each dag running within 
 an application in the TezInputContext, TezOutputContext, TezProcessorContext.
 When containers are re-used, or for external services, this can be used to 
 generate intermediate data to a dag specific directory instead of an 
 application specific directory, where it becomes difficult to differentiate 
 between different dags.
 The DAG name does provide this - but is not suitable for use in a directory 
 name. Hashing the name is an option, but can lead to collisions.
 Generating data into a dag specific directory will eventually only be usable 
 when we move away from the default MR handler, or enhance it to support an 
 additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2368) Make a dag identifier available in Context classes

2015-04-27 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2368:

Summary: Make a dag identifier available in Context classes  (was: Make the 
dag number available in Context classes)

 Make a dag identifier available in Context classes
 --

 Key: TEZ-2368
 URL: https://issues.apache.org/jira/browse/TEZ-2368
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2368.1.txt, TEZ-2368.2.txt, TEZ-2368.3.txt


 Provide the dag number, which is a unique number, for each dag running within 
 an application in the TezInputContext, TezOutputContext, TezProcessorContext.
 When containers are re-used, or for external services, this can be used to 
 generate intermediate data to a dag specific directory instead of an 
 application specific directory, where it becomes difficult to differentiate 
 between different dags.
 The DAG name does provide this - but is not suitable for use in a directory 
 name. Hashing the name is an option, but can lead to collisions.
 Generating data into a dag specific directory will eventually only be usable 
 when we move away from the default MR handler, or enhance it to support an 
 additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2226) Disable writing history to timeline if domain creation fails.

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515125#comment-14515125
 ] 

Hitesh Shah edited comment on TEZ-2226 at 4/27/15 10:32 PM:


Thanks for the patch 11 [~lichangleo]. I started making some minor mods over 
patch 10 in addition to recovery support. Mostly cleanup ( some renames ) but 
also handling one case where history events are generated that are not related 
to a dag ( app launched etc ). 

Will upload an add-on patch for .10 shortly in addition to a combined patch. 




was (Author: hitesh):
Thanks for the patch 11 [~lichangleo]. I started making some minor mods over 
patch 10 in addition to recovery support. Mostly cleanup but also handling one 
case where history events are generated that are not related to a dag ( app 
launched etc ). 

Will upload an add-on patch for .10 shortly in addition to a combined patch. 



 Disable writing history to timeline if domain creation fails.
 -

 Key: TEZ-2226
 URL: https://issues.apache.org/jira/browse/TEZ-2226
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Chang Li
Priority: Blocker
 Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.2.patch, 
 TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, TEZ-2226.6.patch, 
 TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, 
 TEZ-2226.addon-for-patch10, TEZ-2226.addon-for-patch10-combined.full.patch, 
 TEZ-2226.patch, TEZ-2226.wip.2.patch, TEZ-2226.wip.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2325) Route status update event directly to the attempt

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2325:
-
Target Version/s: 0.7.0

 Route status update event directly to the attempt 
 --

 Key: TEZ-2325
 URL: https://issues.apache.org/jira/browse/TEZ-2325
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Prakash Ramachandran
 Attachments: TEZ-2325.1.patch, TEZ-2325.2.patch, TEZ-2325.3.patch, 
 TEZ-2325.4.patch


 Today, all events from the attempt heartbeat are routed to the vertex. then 
 the vertex routes (if any) status update events to the attempt. This is 
 unnecessary and potentially creates out of order scenarios. We could route 
 the status update events directly to attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2363) Counters: off by 1 error for REDUCE_INPUT_GROUPS counter

2015-04-27 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515996#comment-14515996
 ] 

Rajesh Balamohan commented on TEZ-2363:
---

lgtm. +1.

I believe the javac warning can be addressed by adding 
@SuppressWarnings(unchecked) near TestValuesIterator.createCountedIterator?

 Counters: off by 1 error for REDUCE_INPUT_GROUPS counter
 

 Key: TEZ-2363
 URL: https://issues.apache.org/jira/browse/TEZ-2363
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: TEZ-2363.1.patch


 The reduce input key groups are not incremented for the first key in 
 operation, only for the second key does it increment in moveToNext() - 
 nextKey() - inputKeyCounter.increment(1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

2015-04-27 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-391:
---
Target Version/s: 0.8.0

 SharedEdge - Support for passing same output from a vertex as input to two 
 different vertices
 -

 Key: TEZ-391
 URL: https://issues.apache.org/jira/browse/TEZ-391
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy
Assignee: Jeff Zhang
 Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
 TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, 
 TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch


   We need this for lot of usecases. For cases where multi-query is turned off 
 and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
 we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2226) Disable writing history to timeline if domain creation fails.

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2226:
-
Attachment: TEZ-2226.addon-for-patch10-combined.full.patch
TEZ-2226.addon-for-patch10

[~lichangleo] Take a look. 

[~zjffdu] [~pramachandran] Mind reviewing.

 Disable writing history to timeline if domain creation fails.
 -

 Key: TEZ-2226
 URL: https://issues.apache.org/jira/browse/TEZ-2226
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Chang Li
Priority: Blocker
 Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.2.patch, 
 TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, TEZ-2226.6.patch, 
 TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, 
 TEZ-2226.addon-for-patch10, TEZ-2226.addon-for-patch10-combined.full.patch, 
 TEZ-2226.patch, TEZ-2226.wip.2.patch, TEZ-2226.wip.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2226 PreCommit Build #556

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2226
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/556/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2782 lines...]
[INFO] Final Memory: 68M/852M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728605/TEZ-2226.12.patch
  against master revision 9e9cf99.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/556//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/556//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
584b970040c28bcc7375f80bb496af75e711f4af logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #555
Archived 44 artifacts
Archive block size is 32768
Received 23 blocks and 1995647 bytes
Compression is 27.4%
Took 1.5 sec
Description set: TEZ-2226
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2374) Fix build break against hadoop-2.2 due to TEZ-2325

2015-04-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516094#comment-14516094
 ] 

TezQA commented on TEZ-2374:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728610/TEZ-2374.1.patch
  against master revision 9e9cf99.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 160 javac 
compiler warnings (more than the master's current 159 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/557//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/557//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/557//console

This message is automatically generated.

 Fix build break against hadoop-2.2 due to TEZ-2325
 --

 Key: TEZ-2374
 URL: https://issues.apache.org/jira/browse/TEZ-2374
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-2374.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2374 PreCommit Build #557

2015-04-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2374
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/557/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2770 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728610/TEZ-2374.1.patch
  against master revision 9e9cf99.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 160 javac 
compiler warnings (more than the master's current 159 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/557//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/557//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/557//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
0846e03c24003fad190fd562be96a766a151fb8b logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #556
Archived 45 artifacts
Archive block size is 32768
Received 26 blocks and 1902290 bytes
Compression is 30.9%
Took 0.84 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-2226) Disable writing history to timeline if domain creation fails.

2015-04-27 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated TEZ-2226:
--
Attachment: TEZ-2226.11.patch

 Disable writing history to timeline if domain creation fails.
 -

 Key: TEZ-2226
 URL: https://issues.apache.org/jira/browse/TEZ-2226
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Chang Li
Priority: Blocker
 Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.2.patch, 
 TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, TEZ-2226.6.patch, 
 TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, TEZ-2226.patch, 
 TEZ-2226.wip.2.patch, TEZ-2226.wip.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2303) ConcurrentModificationException while processing recovery

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516216#comment-14516216
 ] 

Hitesh Shah commented on TEZ-2303:
--

In that case, +1 for patch 1. Please open a new jira for the long term fix. 

 ConcurrentModificationException while processing recovery
 -

 Key: TEZ-2303
 URL: https://issues.apache.org/jira/browse/TEZ-2303
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jason Lowe
Assignee: Jeff Zhang
 Attachments: TEZ-2303-1.patch, TEZ-2303-2.patch, TEZ-2303-4.patch


 Saw a Tez AM log a few ConcurrentModificationException messages while trying 
 to recover from a previous attempt that crashed.  Exception details to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2375) Don't return dag status to client when dag is still in recovering

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516222#comment-14516222
 ] 

Hitesh Shah commented on TEZ-2375:
--

A different approach is to send back a recovering status back to the client and 
the client should be changed to cache the last seen valid progress. Using this, 
the user will never see a regression in progress unless recovery fails.  

 Don't return dag status to client when dag is still in recovering 
 --

 Key: TEZ-2375
 URL: https://issues.apache.org/jira/browse/TEZ-2375
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang

 Should only return dag status to client after the whole recovery process is 
 done (DAG/Vertex/Task/TaskAttempt are all recovered to its correct state)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2375) Don't return dag status to client when dag is still in recovering

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516222#comment-14516222
 ] 

Hitesh Shah edited comment on TEZ-2375 at 4/28/15 2:41 AM:
---

A different approach is to send back a recovering status back to the client and 
the client should be changed to cache the last seen valid progress. Using this, 
the user will never see a regression in progress unless recovery fails or all 
tasks are not recovered from previous attempt.  


was (Author: hitesh):
A different approach is to send back a recovering status back to the client and 
the client should be changed to cache the last seen valid progress. Using this, 
the user will never see a regression in progress unless recovery fails.  

 Don't return dag status to client when dag is still in recovering 
 --

 Key: TEZ-2375
 URL: https://issues.apache.org/jira/browse/TEZ-2375
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang

 Should only return dag status to client after the whole recovery process is 
 done (DAG/Vertex/Task/TaskAttempt are all recovered to its correct state)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1577) Recover attempt information when recovering from task desired state

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516262#comment-14516262
 ] 

Hitesh Shah commented on TEZ-1577:
--

\cc [~zjffdu]

 Recover attempt information when recovering from task desired state
 ---

 Key: TEZ-1577
 URL: https://issues.apache.org/jira/browse/TEZ-1577
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth
Priority: Critical

 TaskImpl has a TODO item for this - // TODO recover attempts if desired 
 state is given?.
 InputInitializerEvent recovery will fail without this change, since the 
 successful attempt number is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1522) Scheduling can result in out of order execution and slowdown of upstream work

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1522:
-
Target Version/s:   (was: 0.6.0)

 Scheduling can result in out of order execution and slowdown of upstream work
 -

 Key: TEZ-1522
 URL: https://issues.apache.org/jira/browse/TEZ-1522
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: performance
 Attachments: TEZ-1522.1.wip.txt, TEZ-1522.2.wip.txt, 
 TEZ-1522.am.log.gz, task_runtime.svg


 M2 M7
 \  /
 (sg) \/
R3/ (b)
 \   /
  (b) \ /
   \   /
 M5
 |
 R6 
 Plz refer to the attachment (task runtime SVG). In this case, M5 got 
 scheduled much earlier than R3 (green color in the diagram) and retained lots 
 of containers.
 R3 got less containers to work with. 
 Attaching the output from the status monitor when the job ran;  Map_5 has 
 taken up almost all of cluster resource, whereas Reducer_3 got fraction of 
 the capacity.
 Map_2: 1/1  Map_5: 0(+373)/1000 Map_7: 1/1  Reducer_3: 0/8000 
   Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 0/8000 
   Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 0(+1)/8000 
   Reducer_6: 0/1
 
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 14(+7)/8000  Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 63(+14)/8000 Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 159(+22)/8000Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 308(+29)/8000Reducer_6: 0/1
 ...
 Creating this JIRA as a placeholder for scheduler enhancement. One 
 possibililty could be to
 schedule lesser number of tasks in downstream vertices, based on the 
 information available for the upstream vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1732) Temporary mitigation for out of order scheduling

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516261#comment-14516261
 ] 

Hitesh Shah edited comment on TEZ-1732 at 4/28/15 3:16 AM:
---

[~bikassaha] [~sseth] Mind setting a target version as well as affects version


was (Author: hitesh):
[~bikassaha] [~sseth] Mind setting a target version

 Temporary mitigation for out of order scheduling
 

 Key: TEZ-1732
 URL: https://issues.apache.org/jira/browse/TEZ-1732
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2375) Don't return dag status to client when dag is still in recovering

2015-04-27 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516275#comment-14516275
 ] 

Jeff Zhang commented on TEZ-2375:
-

Agree. 

 Don't return dag status to client when dag is still in recovering 
 --

 Key: TEZ-2375
 URL: https://issues.apache.org/jira/browse/TEZ-2375
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang

 Should only return dag status to client after the whole recovery process is 
 done (DAG/Vertex/Task/TaskAttempt are all recovered to its correct state)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2314) Tez task attempt failures due to bad event serialization

2015-04-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516151#comment-14516151
 ] 

Bikas Saha commented on TEZ-2314:
-

Thanks! Will wait for [~rohini] to confirm that this patch fixes the issue she 
reported. If not then I will open a separate jira for this and commit it.

 Tez task attempt failures due to bad event serialization
 

 Key: TEZ-2314
 URL: https://issues.apache.org/jira/browse/TEZ-2314
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Rohini Palaniswamy
Assignee: Bikas Saha
Priority: Blocker
 Attachments: TEZ-2314.1.patch, TEZ-2314.log.patch


 {code}
 2015-04-13 19:21:48,516 WARN [Socket Reader #3 for port 53530] ipc.Server: 
 Unable to read call parameters for client 10.216.13.112on connection protocol 
 org.apache.tez.common.TezTaskUmbilicalProtocol for rpcKind RPC_WRITABLE
 java.lang.ArrayIndexOutOfBoundsException: 1935896432
 at 
 org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120)
 at 
 org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271)
 at 
 org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110)
 at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160)
 at 
 org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1884)
 at 
 org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1816)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1574)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:806)
 at 
 org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:673)
 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:644)
 {code}
 cc/ [~hitesh] and [~bikassaha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1560) Invalid state machine transition in recovery

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1560:
-
Target Version/s: 0.7.0

 Invalid state machine transition in recovery
 

 Key: TEZ-1560
 URL: https://issues.apache.org/jira/browse/TEZ-1560
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Critical
 Attachments: failed_tez_job.txt.gz


 {code}
 2014-09-04 16:08:25,504 INFO [main] org.apache.tez.dag.app.dag.impl.DAGImpl: 
 dag_1409818083015_0001_1 transitioned from NEW to RUNNING
 2014-09-04 16:08:25,504 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, 
 vertexId=vertex_1409818083015_0001_1_00 [v1], state=NEW, 
 numInitedSourceVertices=0, numStartedSourceVertices=0, 
 numRecoveredSourceVertices=0, recoveredEvents=0, tasksIsNull=false, numTasks=0
 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Root Inputs exist for Vertex: v1 
 : {Input={InputName=Input}, 
 {Descriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$NoOpInput, 
 hasPayload=false}, 
 {ControllerDescriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer,
  hasPayload=false}}
 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializer 
 for input: Input, with class: 
 [org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer]
 2014-09-04 16:08:25,506 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Setting user vertex manager 
 plugin: 
 org.apache.tez.test.dag.MultiAttemptDAG$FailOnAttemptVertexManagerPlugin on 
 vertex: v1
 2014-09-04 16:08:25,508 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Creating 2 for vertex: 
 vertex_1409818083015_0001_1_00 [v1]
 2014-09-04 16:08:25,518 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializers: 
 1
 2014-09-04 16:08:25,520 INFO [InputInitializer [v1] #0] 
 org.apache.tez.dag.app.dag.RootInputInitializerManager: Starting 
 InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 
 [v1]
 2014-09-04 16:08:25,522 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.RootInputInitializerManager: Succeeded 
 InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 
 [v1]
 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: vertex_1409818083015_0001_1_00 
 [v1] transitioned from NEW to INITIALIZING due to event V_INIT
 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, 
 vertexId=vertex_1409818083015_0001_1_01 [v2], state=NEW, 
 numInitedSourceVertices0, numStartedSourceVertices=0, 
 numRecoveredSourceVertices=1, tasksIsNull=false, numTasks=0
 2014-09-04 16:08:25,523 ERROR [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event 
 V_SOURCE_VERTEX_RECOVERED on vertex v2 with vertexId 
 vertex_1409818083015_0001_1_01 at current state NEW
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 V_SOURCE_VERTEX_RECOVERED at NEW
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1344)
   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1)
   at 
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1641)
   at 
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 2014-09-04 16:08:25,524 FATAL [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-711) Fix memory leak when not reading from inputs due to caching

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516264#comment-14516264
 ] 

Hitesh Shah commented on TEZ-711:
-

[~rajesh.balamohan] [~sseth] is this still valid? 

 Fix memory leak when not reading from inputs due to caching
 ---

 Key: TEZ-711
 URL: https://issues.apache.org/jira/browse/TEZ-711
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Rohini Palaniswamy
Assignee: Siddharth Seth
Priority: Critical
 Attachments: OOM-threaddump-711-5-patch.txt, 
 OOM-threaddump-till-TEZ-752.txt, TEZ-711.5.txt, TEZ-711.wip.1.txt, 
 TEZ-711.wip.2.txt, TEZ-711.wip.3.txt, TEZ-711.wip.4.txt


   When you are reading from inputs and caching objects with vertex scope, you 
 don't have to read the input again when container is reused. But it allocates 
 memory and that leaks causing OOM. KeyValueReader does not have a API to 
 close the reader to clear allotted memory without reading from it. Also if 
 there was a option to pre-close inputs in Processor and not fetch input at 
 all over the wire and do shuffle/sort it would be a good optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2372) TestAMRecovery failing in latest build

2015-04-27 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516307#comment-14516307
 ] 

Jeff Zhang commented on TEZ-2372:
-

Very weird, no test info for this. 

https://builds.apache.org/job/Tez-Build/1018/testReport/org.apache.tez.test/



 TestAMRecovery failing in latest build 
 ---

 Key: TEZ-2372
 URL: https://issues.apache.org/jira/browse/TEZ-2372
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah

 https://builds.apache.org/job/Tez-Build/1018/console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2303) ConcurrentModificationException while processing recovery

2015-04-27 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516200#comment-14516200
 ] 

Jeff Zhang commented on TEZ-2303:
-

[~hitesh] Yes I think it make sense for the short term fix as least it fix the 
ConcurrentModificationException. 

Regarding the issue of not providing info to clients until the recovery phase 
is over, I think there are 2 main scenario:

* ClientHandler RPC is started but recovery log is not read. In this case, it 
will throw No dag running exception in AM, no effect on the client side.  so 
I think it is OK.
{code}
2015-04-28 09:32:02,054 INFO [IPC Server handler 0 on 6000] ipc.Server: IPC 
Server handler 0 on 6000, call 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from 127.0.0.1:63539 Call#9557 Retry#0
org.apache.tez.dag.api.TezException: No running dag at present
at 
org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:89)
at 
org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:156)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:95)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7465)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
{code}

* The second scenario is that even the recovery log is read, the 
RecoveryTransition may not have completed. Then the client side may still get 
wrong dag status.  As I mentioned, this may need some big change on the 
recovery. We can leave it in future and take it into account when refactoring 
the recovery code. 


 ConcurrentModificationException while processing recovery
 -

 Key: TEZ-2303
 URL: https://issues.apache.org/jira/browse/TEZ-2303
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jason Lowe
Assignee: Jeff Zhang
 Attachments: TEZ-2303-1.patch, TEZ-2303-2.patch, TEZ-2303-4.patch


 Saw a Tez AM log a few ConcurrentModificationException messages while trying 
 to recover from a previous attempt that crashed.  Exception details to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516258#comment-14516258
 ] 

Hitesh Shah commented on TEZ-2305:
--

Sorry for the delay in getting back [~zjffdu]. If we are going with patch .2, 
would you mind adding your unit test to the patch? Would be good to have some 
coverage. 

 MR compatibility sleep job fails with IOException: Undefined job output-path
 

 Key: TEZ-2305
 URL: https://issues.apache.org/jira/browse/TEZ-2305
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Tassapol Athiapinya
Priority: Critical
 Attachments: TEZ-2305-3.patch, TEZ-2305-4.patch, TEZ-2305.1.patch, 
 TEZ-2305.2.patch


 Running MR sleep job has an IOException.
 {code}
 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with 
 state FAILED due to: Vertex failed, vertexName=initialmap, 
 vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, 
 taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 
 failed, info=[Error: Failure while running task:java.io.IOException: 
 Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 1 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 2 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 3 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at 
 

[jira] [Commented] (TEZ-1675) Remove deprecated keys added in TEZ-1674

2015-04-27 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516290#comment-14516290
 ] 

Siddharth Seth commented on TEZ-1675:
-

Should we just remove these in 0.7. Has been in there since 0.5

 Remove deprecated keys added in TEZ-1674
 

 Key: TEZ-1675
 URL: https://issues.apache.org/jira/browse/TEZ-1675
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-711) Fix memory leak when not reading from inputs due to caching

2015-04-27 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516291#comment-14516291
 ] 

Siddharth Seth commented on TEZ-711:


Yes it is.

 Fix memory leak when not reading from inputs due to caching
 ---

 Key: TEZ-711
 URL: https://issues.apache.org/jira/browse/TEZ-711
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Rohini Palaniswamy
Assignee: Siddharth Seth
Priority: Critical
 Attachments: OOM-threaddump-711-5-patch.txt, 
 OOM-threaddump-till-TEZ-752.txt, TEZ-711.5.txt, TEZ-711.wip.1.txt, 
 TEZ-711.wip.2.txt, TEZ-711.wip.3.txt, TEZ-711.wip.4.txt


   When you are reading from inputs and caching objects with vertex scope, you 
 don't have to read the input again when container is reused. But it allocates 
 memory and that leaks causing OOM. KeyValueReader does not have a API to 
 close the reader to clear allotted memory without reading from it. Also if 
 there was a option to pre-close inputs in Processor and not fetch input at 
 all over the wire and do shuffle/sort it would be a good optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1675) Remove deprecated keys added in TEZ-1674

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516317#comment-14516317
 ] 

Hitesh Shah edited comment on TEZ-1675 at 4/28/15 4:22 AM:
---

Given that we have had just one 0.6.0 release since then, it might be worth 
keeping around for a release more. 


was (Author: hitesh):
Given that we have had just one 0.6.0 release, it might be worth keeping around 
for a release more. 

 Remove deprecated keys added in TEZ-1674
 

 Key: TEZ-1675
 URL: https://issues.apache.org/jira/browse/TEZ-1675
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1732) Temporary mitigation for out of order scheduling

2015-04-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516324#comment-14516324
 ] 

Bikas Saha commented on TEZ-1732:
-

Temporary is not relevant anymore.

 Temporary mitigation for out of order scheduling
 

 Key: TEZ-1732
 URL: https://issues.apache.org/jira/browse/TEZ-1732
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-1732) Temporary mitigation for out of order scheduling

2015-04-27 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha resolved TEZ-1732.
-
Resolution: Won't Fix

 Temporary mitigation for out of order scheduling
 

 Key: TEZ-1732
 URL: https://issues.apache.org/jira/browse/TEZ-1732
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2372) TestAMRecovery failing in latest build

2015-04-27 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516376#comment-14516376
 ] 

Jeff Zhang edited comment on TEZ-2372 at 4/28/15 5:08 AM:
--

[~hitesh], Yes this is the only info I can find, even no client side log. It 
seems TestAMRecovery is killed before it started

https://builds.apache.org/job/Tez-Build/1018/testReport/org.apache.tez.test/


was (Author: zjffdu):
[~hitesh], Yes this is the only info I can find. It seems TestAMRecovery is 
killed beofre it started

https://builds.apache.org/job/Tez-Build/1018/testReport/org.apache.tez.test/

 TestAMRecovery failing in latest build 
 ---

 Key: TEZ-2372
 URL: https://issues.apache.org/jira/browse/TEZ-2372
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah

 https://builds.apache.org/job/Tez-Build/1018/console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-604) Revert temporary changes made in TEZ-603

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-604:

Target Version/s: 0.8.0  (was: 0.7.0)

 Revert temporary changes made in TEZ-603
 

 Key: TEZ-604
 URL: https://issues.apache.org/jira/browse/TEZ-604
 Project: Apache Tez
  Issue Type: Task
Reporter: Siddharth Seth
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1675) Remove deprecated keys added in TEZ-1674

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1675:
-
Target Version/s: 0.8.0  (was: 0.7.0)

 Remove deprecated keys added in TEZ-1674
 

 Key: TEZ-1675
 URL: https://issues.apache.org/jira/browse/TEZ-1675
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-316) [Umbrella] Address findbugs warnings in tez codebase

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-316:

Target Version/s: 0.8.0  (was: 0.7.0)

 [Umbrella] Address findbugs warnings in tez codebase
 

 Key: TEZ-316
 URL: https://issues.apache.org/jira/browse/TEZ-316
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Blocker

 findbugs output attached to TEZ-272.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2164) Shade the guava version used by Tez

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2164:
-
Target Version/s: 0.8.0  (was: 0.7.0)

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Priority: Critical
 Attachments: allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2303) ConcurrentModificationException while processing recovery

2015-04-27 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516259#comment-14516259
 ] 

Jeff Zhang commented on TEZ-2303:
-

Thanks [~hitesh] Committed to 0.5, 0.6 and master.  Create TEZ-2375 for long 
term fix.

 ConcurrentModificationException while processing recovery
 -

 Key: TEZ-2303
 URL: https://issues.apache.org/jira/browse/TEZ-2303
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jason Lowe
Assignee: Jeff Zhang
 Attachments: TEZ-2303-1.patch, TEZ-2303-2.patch, TEZ-2303-4.patch


 Saw a Tez AM log a few ConcurrentModificationException messages while trying 
 to recover from a previous attempt that crashed.  Exception details to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2164) Shade the guava version used by Tez

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2164:
-
Target Version/s: 0.7.0  (was: 0.8.0)

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Priority: Critical
 Attachments: allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1908) Analyse and fix javac warnings in tez codebase

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1908:
-
Target Version/s: 0.8.0  (was: 0.7.0)

 Analyse and fix javac warnings in tez codebase 
 ---

 Key: TEZ-1908
 URL: https://issues.apache.org/jira/browse/TEZ-1908
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Critical

 https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/patchJavacWarnings.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2266) Synchronization in VertexImpl etc. broken

2015-04-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2266:
-
Target Version/s: 0.7.0

 Synchronization in VertexImpl etc. broken
 -

 Key: TEZ-2266
 URL: https://issues.apache.org/jira/browse/TEZ-2266
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.3
Reporter: Bikas Saha
Priority: Critical

 There is mixed usage of synchronized blocks and a read-write lock which are 
 not mutually exclusive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1732) Temporary mitigation for out of order scheduling

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516261#comment-14516261
 ] 

Hitesh Shah commented on TEZ-1732:
--

[~bikassaha] [~sseth] Mind setting a target version

 Temporary mitigation for out of order scheduling
 

 Key: TEZ-1732
 URL: https://issues.apache.org/jira/browse/TEZ-1732
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2303) ConcurrentModificationException while processing recovery

2015-04-27 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516200#comment-14516200
 ] 

Jeff Zhang edited comment on TEZ-2303 at 4/28/15 2:22 AM:
--

[~hitesh] Yes I think it make sense for the short term fix as least it fix the 
ConcurrentModificationException, the recovery process can keep going. 

Regarding the issue of not providing info to clients until the recovery phase 
is over, I think there are 2 main scenario:

* ClientHandler RPC is started but recovery log is not read. In this case, it 
will throw No dag running exception in AM, no effect on the client side.  so 
I think it is OK.
{code}
2015-04-28 09:32:02,054 INFO [IPC Server handler 0 on 6000] ipc.Server: IPC 
Server handler 0 on 6000, call 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from 127.0.0.1:63539 Call#9557 Retry#0
org.apache.tez.dag.api.TezException: No running dag at present
at 
org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:89)
at 
org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:156)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:95)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7465)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
{code}

* The second scenario is that even the recovery log is read, the 
RecoveryTransition may not have completed. Then the client side may still get 
wrong dag status.  As I mentioned, this may need some big change on the 
recovery. We can leave it in future and take it into account when refactoring 
the recovery code. 



was (Author: zjffdu):
[~hitesh] Yes I think it make sense for the short term fix as least it fix the 
ConcurrentModificationException. 

Regarding the issue of not providing info to clients until the recovery phase 
is over, I think there are 2 main scenario:

* ClientHandler RPC is started but recovery log is not read. In this case, it 
will throw No dag running exception in AM, no effect on the client side.  so 
I think it is OK.
{code}
2015-04-28 09:32:02,054 INFO [IPC Server handler 0 on 6000] ipc.Server: IPC 
Server handler 0 on 6000, call 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from 127.0.0.1:63539 Call#9557 Retry#0
org.apache.tez.dag.api.TezException: No running dag at present
at 
org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:89)
at 
org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:156)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:95)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7465)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
{code}

* The second scenario is that even the recovery log is read, the 
RecoveryTransition may not have completed. Then the client side may still get 
wrong dag status.  As I mentioned, this may need some big change on the 
recovery. We can leave it in future and take it into account when refactoring 
the recovery code. 


 ConcurrentModificationException while processing recovery
 -

 Key: TEZ-2303
 URL: https://issues.apache.org/jira/browse/TEZ-2303
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jason Lowe
Assignee: Jeff Zhang
 Attachments: TEZ-2303-1.patch, TEZ-2303-2.patch, TEZ-2303-4.patch


 Saw a Tez AM log a 

[jira] [Created] (TEZ-2375) Don't return dag status to client when dag is still in recovering

2015-04-27 Thread Jeff Zhang (JIRA)
Jeff Zhang created TEZ-2375:
---

 Summary: Don't return dag status to client when dag is still in 
recovering 
 Key: TEZ-2375
 URL: https://issues.apache.org/jira/browse/TEZ-2375
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang


Should only return dag status to client after the whole recovery process is 
done (DAG/Vertex/Task/TaskAttempt are all recovered to its correct state)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >