[jira] [Commented] (TEZ-3479) DAG AM does not schedule any more containers in corner cases
[ https://issues.apache.org/jira/browse/TEZ-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587501#comment-15587501 ] Rajesh Balamohan commented on TEZ-3479: --- That is correct. Haven't observed this in other cases. > DAG AM does not schedule any more containers in corner cases > > > Key: TEZ-3479 > URL: https://issues.apache.org/jira/browse/TEZ-3479 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: application_1476667862449_0031_not_complete.1.log.tar.gz > > > Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7. > Some workloads end up generating lots of data that the tasks start throwing > "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after > enough number of retries which happens most of the time. Once in a while (~ > once in 20-30 runs), DAG AM gets into hung state and does not schedule any > more containers for the failed task attempts. Will attach the logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3479) DAG AM does not schedule any more containers in corner cases
[ https://issues.apache.org/jira/browse/TEZ-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587378#comment-15587378 ] Hitesh Shah commented on TEZ-3479: -- bq. I haven't disabled recovery in my runs. To clarify, my question was whether this reproduces only in the cases where the AM crashes and restarts? > DAG AM does not schedule any more containers in corner cases > > > Key: TEZ-3479 > URL: https://issues.apache.org/jira/browse/TEZ-3479 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: application_1476667862449_0031_not_complete.1.log.tar.gz > > > Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7. > Some workloads end up generating lots of data that the tasks start throwing > "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after > enough number of retries which happens most of the time. Once in a while (~ > once in 20-30 runs), DAG AM gets into hung state and does not schedule any > more containers for the failed task attempts. Will attach the logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3465) Support broadcast edge into cartesian product vertex and forbid other edges
[ https://issues.apache.org/jira/browse/TEZ-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587305#comment-15587305 ] TezQA commented on TEZ-3465: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12834068/TEZ-3465.3.patch against master revision 67243a0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2045//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2045//console This message is automatically generated. > Support broadcast edge into cartesian product vertex and forbid other edges > --- > > Key: TEZ-3465 > URL: https://issues.apache.org/jira/browse/TEZ-3465 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3465.1.patch, TEZ-3465.2.patch, TEZ-3465.3.patch > > > Cartesian product vertex manager should support other incoming edge type. > Currently only broadcast edge is necessary, although potentially more edge > types could also be. Custom edge need its own vertex manager which can't work > with Cartesian product VM, so it has to be forbade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-3465 PreCommit Build #2045
Jira: https://issues.apache.org/jira/browse/TEZ-3465 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2045/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 4819 lines...] [INFO] Tez SUCCESS [ 0.026 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 57:00 min [INFO] Finished at: 2016-10-19T01:26:37+00:00 [INFO] Final Memory: 80M/1495M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12834068/TEZ-3465.3.patch against master revision 67243a0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2045//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2045//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 2a03768c8bb4cc7e131e0a076013b2152eef909c logged out == == Finished build. == == Archiving artifacts [description-setter] Description set: TEZ-3465 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3405) Support ability for AM to kill itself if there is no client heartbeating to it
[ https://issues.apache.org/jira/browse/TEZ-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587218#comment-15587218 ] Siddharth Seth commented on TEZ-3405: - +1. > Support ability for AM to kill itself if there is no client heartbeating to it > -- > > Key: TEZ-3405 > URL: https://issues.apache.org/jira/browse/TEZ-3405 > Project: Apache Tez > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Hitesh Shah >Priority: Critical > Attachments: TEZ-3405.1.patch, TEZ-3405.2.patch, TEZ-3405.3.patch, > TEZ-3405.4.patch, TEZ-3405.5.patch > > > HiveServer2 optionally maintains a pool of AMs in either Tez or LLAP mode. > This is done to amortize the cost of launching a Tez session. > We also try in a shutdown hook to kill all these AMs when HS2 goes down. > However, there are cases where HS2 doesn't get the chance to kill these AMs > before it goes away. As a result these zombie AMs hang around until the > timeout kicks in. > The trouble with the timeout is that we have to set it fairly high. Otherwise > the benefit of having pre-launched AMs obviously goes away (in a lightly > loaded cluster). > So, if people kill/restart HS2 they often times run into situations where the > cluster/queue doesn't have any more capacity for AMs. They either have to > manually kill the zombies or wait. > The request is therefore for Tez to maintain a heartbeat to the client. If > the client goes away the AM should exit. That way we can keep the AMs alive > for a long time regardless of activity and at the same time don't have to > worry about them if HS2 goes down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3477) MRInputHelpers generateInputSplitsToMem public API modified
[ https://issues.apache.org/jira/browse/TEZ-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587215#comment-15587215 ] Hitesh Shah commented on TEZ-3477: -- [~jeagles] The change seems straightforward. Do we want to change APIs to limited private (hive/pig) so future commits to this are looked at a bit more carefully for compatibility? > MRInputHelpers generateInputSplitsToMem public API modified > --- > > Key: TEZ-3477 > URL: https://issues.apache.org/jira/browse/TEZ-3477 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-3477.1.patch > > > Pig and Hive directly rely on specific APIs in MRInputHelpers. I would like > to ensure these signature are prevented from being modified. > - MRInputHelpers.generateInputSplitsToMem > - MRInputHelpers.parseMRInputPayload > - MRInputHelpers.createSplitProto > - MRInputHelpers.createOldFormatSplitFromUserPayload > - MRInputHelpers.configureMRInputWithLegacySplitGeneration > A recent fixed jira TEZ-3430 modified generateInputSplitsToMem > {code} > java.lang.NoSuchMethodError: > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(Lorg/apache/hadoop/conf/Configuration;ZI)Lorg/apache/tez/mapreduce/hadoop/InputSplitInfoMem; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3458) Auto grouping for cartesian product edge(unpartitioned case)
[ https://issues.apache.org/jira/browse/TEZ-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3458: -- Attachment: TEZ-3458.3.patch > Auto grouping for cartesian product edge(unpartitioned case) > > > Key: TEZ-3458 > URL: https://issues.apache.org/jira/browse/TEZ-3458 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3458.1.patch, TEZ-3458.2.patch, TEZ-3458.3.patch > > > Original CartesianProductVertexManagerUnpartitioned set parallelism as > product of all source vertices parallelism which may explode to insane > number. We should do auto reduce as in ShuffleVertexManager to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3465) Support broadcast edge into cartesian product vertex and forbid other edges
[ https://issues.apache.org/jira/browse/TEZ-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3465: -- Attachment: TEZ-3465.3.patch > Support broadcast edge into cartesian product vertex and forbid other edges > --- > > Key: TEZ-3465 > URL: https://issues.apache.org/jira/browse/TEZ-3465 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3465.1.patch, TEZ-3465.2.patch, TEZ-3465.3.patch > > > Cartesian product vertex manager should support other incoming edge type. > Currently only broadcast edge is necessary, although potentially more edge > types could also be. Custom edge need its own vertex manager which can't work > with Cartesian product VM, so it has to be forbade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-3479) DAG AM does not schedule any more containers in corner cases
[ https://issues.apache.org/jira/browse/TEZ-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587023#comment-15587023 ] Hitesh Shah edited comment on TEZ-3479 at 10/18/16 11:31 PM: - Atleast for this scenario, I think we did not recover task_1476667862449_0031_1_07_04 properly to a failed state which ends up leading to a hang as the vertex cannot complete. {code} 2016-10-18 07:06:24,837 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: Task Completion: vertex_1476667862449_0031_1_07 [Map 3], tasks=29, failed=1, killed=24, success=3, completed=28, commits=0, err=OWN_TASK_FAILURE {code} The task failure tracked is for task_1476667862449_0031_1_07_00 and not for 0004. was (Author: hitesh): Atleast for this scenario, I think we did not recover task_1476667862449_0031_1_07_04 properly to a failed state which ends up leading to a hang as the vertex cannot complete. {code} 2016-10-18 07:06:24,837 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: Task Completion: vertex_1476667862449_0031_1_07 [Map 3], tasks=29, failed=1, killed=24, success=3, completed=28, commits=0, err=OWN_TASK_FAILURE {code} > DAG AM does not schedule any more containers in corner cases > > > Key: TEZ-3479 > URL: https://issues.apache.org/jira/browse/TEZ-3479 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: application_1476667862449_0031_not_complete.1.log.tar.gz > > > Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7. > Some workloads end up generating lots of data that the tasks start throwing > "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after > enough number of retries which happens most of the time. Once in a while (~ > once in 20-30 runs), DAG AM gets into hung state and does not schedule any > more containers for the failed task attempts. Will attach the logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3479) DAG AM does not schedule any more containers in corner cases
[ https://issues.apache.org/jira/browse/TEZ-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587023#comment-15587023 ] Hitesh Shah commented on TEZ-3479: -- Atleast for this scenario, I think we did not recover task_1476667862449_0031_1_07_04 properly to a failed state which ends up leading to a hang as the vertex cannot complete. {code} 2016-10-18 07:06:24,837 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: Task Completion: vertex_1476667862449_0031_1_07 [Map 3], tasks=29, failed=1, killed=24, success=3, completed=28, commits=0, err=OWN_TASK_FAILURE {code} > DAG AM does not schedule any more containers in corner cases > > > Key: TEZ-3479 > URL: https://issues.apache.org/jira/browse/TEZ-3479 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: application_1476667862449_0031_not_complete.1.log.tar.gz > > > Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7. > Some workloads end up generating lots of data that the tasks start throwing > "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after > enough number of retries which happens most of the time. Once in a while (~ > once in 20-30 runs), DAG AM gets into hung state and does not schedule any > more containers for the failed task attempts. Will attach the logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3479) DAG AM does not schedule any more containers in corner cases
[ https://issues.apache.org/jira/browse/TEZ-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586998#comment-15586998 ] Rajesh Balamohan commented on TEZ-3479: --- [~hitesh] - I haven't disabled recovery in my runs. Will check that. > DAG AM does not schedule any more containers in corner cases > > > Key: TEZ-3479 > URL: https://issues.apache.org/jira/browse/TEZ-3479 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: application_1476667862449_0031_not_complete.1.log.tar.gz > > > Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7. > Some workloads end up generating lots of data that the tasks start throwing > "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after > enough number of retries which happens most of the time. Once in a while (~ > once in 20-30 runs), DAG AM gets into hung state and does not schedule any > more containers for the failed task attempts. Will attach the logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3478) Cleanup fetcher data for failing task attempts (Unordered fetcher)
[ https://issues.apache.org/jira/browse/TEZ-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586993#comment-15586993 ] Rajesh Balamohan commented on TEZ-3478: --- Haven't checked for ordered case yet, but should be present there as well. Created this ticket to handle cleanup of unordered data here. Will create subsequent jira for ordered case. > Cleanup fetcher data for failing task attempts (Unordered fetcher) > -- > > Key: TEZ-3478 > URL: https://issues.apache.org/jira/browse/TEZ-3478 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > > Env: 3 node AWS cluster with entire dataset in S3. Since data is in S3, it > does have not additional storage for HDFS (uses existing space available in > VMs). tez version is 0.7. > With some workloads (e.g q29 in tpcds), unordered fetchers download data in > parallel for different vertices and runs out of disk space. However, > downloaded > data related to these failed task attempts are not cleared. So subsequent > task attempts also encounter similar situation and fails with "No space" > exception. e.g stack trace > {noformat} > , errorMessage=Fetch failed:org.apache.hadoop.fs.FSError: > java.io.IOException: No space left on device > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:261) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:426) > at > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:206) > at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:124) > at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:110) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToDisk(ShuffleUtils.java:146) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:771) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:497) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:396) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:195) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:70) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:345) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSy > {noformat} > This would also affect any other job running in the cluster at the same time. > It would be helpful to clean up the data downloaded for the failed task > attempts. > Creating this ticket mainly for unordered fetcher case, though it could be > similar case for ordered shuffle case as well. > e.g files > {noformat} > 17M > /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_0_10023_src_62_spill_-1.out > 18M > /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_0_10023_src_63_spill_-1.out > 16M > /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_0_10023_src_64_spill_-1.out > .. > .. > 18M > /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_2_10003_src_0_spill_-1.out > 17M > /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_2_10003_src_13_spill_-1.out
[jira] [Commented] (TEZ-3479) DAG AM does not schedule any more containers in corner cases
[ https://issues.apache.org/jira/browse/TEZ-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586992#comment-15586992 ] Hitesh Shah commented on TEZ-3479: -- [~rajesh.balamohan] Is this happening only in the cases where the AM crashes and tries to recover? > DAG AM does not schedule any more containers in corner cases > > > Key: TEZ-3479 > URL: https://issues.apache.org/jira/browse/TEZ-3479 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: application_1476667862449_0031_not_complete.1.log.tar.gz > > > Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7. > Some workloads end up generating lots of data that the tasks start throwing > "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after > enough number of retries which happens most of the time. Once in a while (~ > once in 20-30 runs), DAG AM gets into hung state and does not schedule any > more containers for the failed task attempts. Will attach the logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3478) Cleanup fetcher data for failing task attempts (Unordered fetcher)
[ https://issues.apache.org/jira/browse/TEZ-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586962#comment-15586962 ] Hitesh Shah commented on TEZ-3478: -- Is this only an issue with unordered data? > Cleanup fetcher data for failing task attempts (Unordered fetcher) > -- > > Key: TEZ-3478 > URL: https://issues.apache.org/jira/browse/TEZ-3478 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > > Env: 3 node AWS cluster with entire dataset in S3. Since data is in S3, it > does have not additional storage for HDFS (uses existing space available in > VMs). tez version is 0.7. > With some workloads (e.g q29 in tpcds), unordered fetchers download data in > parallel for different vertices and runs out of disk space. However, > downloaded > data related to these failed task attempts are not cleared. So subsequent > task attempts also encounter similar situation and fails with "No space" > exception. e.g stack trace > {noformat} > , errorMessage=Fetch failed:org.apache.hadoop.fs.FSError: > java.io.IOException: No space left on device > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:261) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:426) > at > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:206) > at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:124) > at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:110) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToDisk(ShuffleUtils.java:146) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:771) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:497) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:396) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:195) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:70) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:345) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSy > {noformat} > This would also affect any other job running in the cluster at the same time. > It would be helpful to clean up the data downloaded for the failed task > attempts. > Creating this ticket mainly for unordered fetcher case, though it could be > similar case for ordered shuffle case as well. > e.g files > {noformat} > 17M > /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_0_10023_src_62_spill_-1.out > 18M > /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_0_10023_src_63_spill_-1.out > 16M > /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_0_10023_src_64_spill_-1.out > .. > .. > 18M > /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_2_10003_src_0_spill_-1.out > 17M > /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_2_10003_src_13_spill_-1.out > 16M >
[jira] [Updated] (TEZ-3479) DAG AM does not schedule any more containers in corner cases
[ https://issues.apache.org/jira/browse/TEZ-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3479: -- Attachment: application_1476667862449_0031_not_complete.1.log.tar.gz > DAG AM does not schedule any more containers in corner cases > > > Key: TEZ-3479 > URL: https://issues.apache.org/jira/browse/TEZ-3479 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: application_1476667862449_0031_not_complete.1.log.tar.gz > > > Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7. > Some workloads end up generating lots of data that the tasks start throwing > "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after > enough number of retries which happens most of the time. Once in a while (~ > once in 20-30 runs), DAG AM gets into hung state and does not schedule any > more containers for the failed task attempts. Will attach the logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3479) DAG AM does not schedule any more containers in corner cases
[ https://issues.apache.org/jira/browse/TEZ-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3479: -- Affects Version/s: 0.7.1 > DAG AM does not schedule any more containers in corner cases > > > Key: TEZ-3479 > URL: https://issues.apache.org/jira/browse/TEZ-3479 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > > Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7. > Some workloads end up generating lots of data that the tasks start throwing > "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after > enough number of retries which happens most of the time. Once in a while (~ > once in 20-30 runs), DAG AM gets into hung state and does not schedule any > more containers for the failed task attempts. Will attach the logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3479) DAG AM does not schedule any more containers in corner cases
Rajesh Balamohan created TEZ-3479: - Summary: DAG AM does not schedule any more containers in corner cases Key: TEZ-3479 URL: https://issues.apache.org/jira/browse/TEZ-3479 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7. Some workloads end up generating lots of data that the tasks start throwing "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after enough number of retries which happens most of the time. Once in a while (~ once in 20-30 runs), DAG AM gets into hung state and does not schedule any more containers for the failed task attempts. Will attach the logs shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3478) Cleanup fetcher data for failing task attempts (Unordered fetcher)
Rajesh Balamohan created TEZ-3478: - Summary: Cleanup fetcher data for failing task attempts (Unordered fetcher) Key: TEZ-3478 URL: https://issues.apache.org/jira/browse/TEZ-3478 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Priority: Minor Env: 3 node AWS cluster with entire dataset in S3. Since data is in S3, it does have not additional storage for HDFS (uses existing space available in VMs). tez version is 0.7. With some workloads (e.g q29 in tpcds), unordered fetchers download data in parallel for different vertices and runs out of disk space. However, downloaded data related to these failed task attempts are not cleared. So subsequent task attempts also encounter similar situation and fails with "No space" exception. e.g stack trace {noformat} , errorMessage=Fetch failed:org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:261) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:426) at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:206) at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:124) at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:110) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToDisk(ShuffleUtils.java:146) at org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:771) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:497) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:396) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:195) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:70) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSy {noformat} This would also affect any other job running in the cluster at the same time. It would be helpful to clean up the data downloaded for the failed task attempts. Creating this ticket mainly for unordered fetcher case, though it could be similar case for ordered shuffle case as well. e.g files {noformat} 17M /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_0_10023_src_62_spill_-1.out 18M /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_0_10023_src_63_spill_-1.out 16M /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_0_10023_src_64_spill_-1.out .. .. 18M /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_2_10003_src_0_spill_-1.out 17M /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_2_10003_src_13_spill_-1.out 16M /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_2_10003_src_15_spill_-1.out 16M /hadoopfs/fs1/yarn/nodemanager/usercache/cloudbreak/appcache/application_1476667862449_0043/attempt_1476667862449_0043_1_07_28_2_10003_src_17_spill_-1.ou {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3439) Tez joinvalidate fails when first input argument size is bigger than the second
[ https://issues.apache.org/jira/browse/TEZ-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-3439: - Summary: Tez joinvalidate fails when first input argument size is bigger than the second (was: Tez joinvalidate example failed when first input argument size is bigger than the second) > Tez joinvalidate fails when first input argument size is bigger than the > second > --- > > Key: TEZ-3439 > URL: https://issues.apache.org/jira/browse/TEZ-3439 > Project: Apache Tez > Issue Type: Bug >Reporter: Hui Cao >Assignee: Hui Cao > Attachments: TEZ-3439.1.patch, TEZ-3439.2.patch > > > when using joinvalidate in Tez example jar. as command > {{"hadoop jar tez-examples-.jar joinvalidate "}} > if the size of is bigger than , an IOException is thrown. > {noformat} > 16/09/21 00:07:53 INFO examples.JoinValidate: DAG diagnostics: [Vertex > failed, vertexName=joinvalidate, vertexId=vertex_1473073428528_0031_1_02, > diagnostics=[Task failed, taskId=task_1473073428528_0031_1_02_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : attempt_1473073428528_0031_1_02_00_0:java.io.IOException: > Please check if you are invoking moveToNext() even after it returned false. > at > org.apache.tez.runtime.library.common.ValuesIterator.hasCompletedProcessing(ValuesIterator.java:221) > at > org.apache.tez.runtime.library.common.ValuesIterator.moveToNext(ValuesIterator.java:103) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput$OrderedGroupedKeyValuesReader.next(OrderedGroupedKVInput.java:321) > at > org.apache.tez.examples.JoinValidate$JoinValidateProcessor.run(JoinValidate.java:254) > at > org.apache.tez.runtime.library.processor.SimpleProcessor.run(SimpleProcessor.java:53) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3439) Tez joinvalidate example failed when first input argument size is bigger than the second
[ https://issues.apache.org/jira/browse/TEZ-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586823#comment-15586823 ] Hitesh Shah commented on TEZ-3439: -- +1. Committing shortly. > Tez joinvalidate example failed when first input argument size is bigger than > the second > > > Key: TEZ-3439 > URL: https://issues.apache.org/jira/browse/TEZ-3439 > Project: Apache Tez > Issue Type: Bug >Reporter: Hui Cao >Assignee: Hui Cao > Attachments: TEZ-3439.1.patch, TEZ-3439.2.patch > > > when using joinvalidate in Tez example jar. as command > {{"hadoop jar tez-examples-.jar joinvalidate "}} > if the size of is bigger than , an IOException is thrown. > {noformat} > 16/09/21 00:07:53 INFO examples.JoinValidate: DAG diagnostics: [Vertex > failed, vertexName=joinvalidate, vertexId=vertex_1473073428528_0031_1_02, > diagnostics=[Task failed, taskId=task_1473073428528_0031_1_02_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : attempt_1473073428528_0031_1_02_00_0:java.io.IOException: > Please check if you are invoking moveToNext() even after it returned false. > at > org.apache.tez.runtime.library.common.ValuesIterator.hasCompletedProcessing(ValuesIterator.java:221) > at > org.apache.tez.runtime.library.common.ValuesIterator.moveToNext(ValuesIterator.java:103) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput$OrderedGroupedKeyValuesReader.next(OrderedGroupedKVInput.java:321) > at > org.apache.tez.examples.JoinValidate$JoinValidateProcessor.run(JoinValidate.java:254) > at > org.apache.tez.runtime.library.processor.SimpleProcessor.run(SimpleProcessor.java:53) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3462) Task attempt failure during container shutdown loses useful container diagnostics
[ https://issues.apache.org/jira/browse/TEZ-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586783#comment-15586783 ] Hitesh Shah commented on TEZ-3462: -- bq. complicated to handle ... since the ATS publish would already have happened This could be doable via a separate history event if needed and diagnostics could be updated into ATS. > Task attempt failure during container shutdown loses useful container > diagnostics > - > > Key: TEZ-3462 > URL: https://issues.apache.org/jira/browse/TEZ-3462 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe >Assignee: Eric Badger > Attachments: TEZ-3462.001.patch > > > When a nodemanager kills a task attempt due to excessive memory usage it will > send a SIGTERM followed by a SIGKILL. It also sends a useful diagnostic > message with the container completion event to the RM which will eventually > make it to the AM on a subsequent heartbeat. > However if the JVM shutdown processing causes an error in the task (e.g.: > filesystem being closed by shutdown hook) then the task attempt can report a > failure before the useful NM diagnostic makes it to the AM. The AM then > records some other error as the task failure reason, and by the time the > container completion status makes it to the AM it does not associate that > error with the task attempt and the useful information is lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3405) Support ability for AM to kill itself if there is no client heartbeating to it
[ https://issues.apache.org/jira/browse/TEZ-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586778#comment-15586778 ] Hitesh Shah commented on TEZ-3405: -- ping [~sseth] - please help with hopefully a final review whenever you get a chance. > Support ability for AM to kill itself if there is no client heartbeating to it > -- > > Key: TEZ-3405 > URL: https://issues.apache.org/jira/browse/TEZ-3405 > Project: Apache Tez > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Hitesh Shah >Priority: Critical > Attachments: TEZ-3405.1.patch, TEZ-3405.2.patch, TEZ-3405.3.patch, > TEZ-3405.4.patch, TEZ-3405.5.patch > > > HiveServer2 optionally maintains a pool of AMs in either Tez or LLAP mode. > This is done to amortize the cost of launching a Tez session. > We also try in a shutdown hook to kill all these AMs when HS2 goes down. > However, there are cases where HS2 doesn't get the chance to kill these AMs > before it goes away. As a result these zombie AMs hang around until the > timeout kicks in. > The trouble with the timeout is that we have to set it fairly high. Otherwise > the benefit of having pre-launched AMs obviously goes away (in a lightly > loaded cluster). > So, if people kill/restart HS2 they often times run into situations where the > cluster/queue doesn't have any more capacity for AMs. They either have to > manually kill the zombies or wait. > The request is therefore for Tez to maintain a heartbeat to the client. If > the client goes away the AM should exit. That way we can keep the AMs alive > for a long time regardless of activity and at the same time don't have to > worry about them if HS2 goes down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3419) Tez UI: Applications page shows error, for users with only DAG level ACL permission.
[ https://issues.apache.org/jira/browse/TEZ-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-3419: - Target Version/s: 0.8.5 (was: 0.9.0) > Tez UI: Applications page shows error, for users with only DAG level ACL > permission. > > > Key: TEZ-3419 > URL: https://issues.apache.org/jira/browse/TEZ-3419 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.7.0 >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > Attachments: Screen Shot 2016-10-13 at 4.25.31 PM.png, Screen Shot > 2016-10-13 at 4.37.09 PM.png, Screen Shot 2016-10-17 at 4.11.29 PM.png, > Screen Shot 2016-10-17 at 4.11.59 PM.png, Screen Shot 2016-10-17 at 4.12.23 > PM.png, TEZ-3419.1.patch, TEZ-3419.2.patch, TEZ-3419.3.patch, > TEZ-3419.4.patch, TEZ-3419.5.patch, TEZ-3419.6.patch, TEZ-3419.wip.1.patch, > Tez data missing.png, YARN & Tez data missing.png, YARN data missing.png > > > Follow this logic and display better message: > On loading app details page, send a request to > /ws/v1/timeline/TEZ_APPLICATION/tez_ > - If it succeed, display the details page as we do now. > - If it fails, send a request to > /ws/v1/timeline/TEZ_DAG_ID?primaryFilter=applicationId%3A > -- If it succeed, then we know that DAGs under the app are available and > assume that the user doesn't have permission to access app level data. > --- If AHS is accessible, display application data from there in the details > page. > --- else if AHS is not accessible, display a message in app details tab, > something like "Data is not available. Check if you are authorized to access > application data!". > --- Also display the DAGs tab, for the user to see DAGs under that app. > -- If it fails, display error message as we do now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3419) Tez UI: Applications page shows error, for users with only DAG level ACL permission.
[ https://issues.apache.org/jira/browse/TEZ-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586674#comment-15586674 ] Hitesh Shah commented on TEZ-3419: -- +1 > Tez UI: Applications page shows error, for users with only DAG level ACL > permission. > > > Key: TEZ-3419 > URL: https://issues.apache.org/jira/browse/TEZ-3419 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.7.0 >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > Attachments: Screen Shot 2016-10-13 at 4.25.31 PM.png, Screen Shot > 2016-10-13 at 4.37.09 PM.png, Screen Shot 2016-10-17 at 4.11.29 PM.png, > Screen Shot 2016-10-17 at 4.11.59 PM.png, Screen Shot 2016-10-17 at 4.12.23 > PM.png, TEZ-3419.1.patch, TEZ-3419.2.patch, TEZ-3419.3.patch, > TEZ-3419.4.patch, TEZ-3419.5.patch, TEZ-3419.6.patch, TEZ-3419.wip.1.patch, > Tez data missing.png, YARN & Tez data missing.png, YARN data missing.png > > > Follow this logic and display better message: > On loading app details page, send a request to > /ws/v1/timeline/TEZ_APPLICATION/tez_ > - If it succeed, display the details page as we do now. > - If it fails, send a request to > /ws/v1/timeline/TEZ_DAG_ID?primaryFilter=applicationId%3A > -- If it succeed, then we know that DAGs under the app are available and > assume that the user doesn't have permission to access app level data. > --- If AHS is accessible, display application data from there in the details > page. > --- else if AHS is not accessible, display a message in app details tab, > something like "Data is not available. Check if you are authorized to access > application data!". > --- Also display the DAGs tab, for the user to see DAGs under that app. > -- If it fails, display error message as we do now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3477) MRInputHelpers generateInputSplitsToMem public API modified
[ https://issues.apache.org/jira/browse/TEZ-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-3477: - Attachment: TEZ-3477.1.patch > MRInputHelpers generateInputSplitsToMem public API modified > --- > > Key: TEZ-3477 > URL: https://issues.apache.org/jira/browse/TEZ-3477 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-3477.1.patch > > > Pig and Hive directly rely on specific APIs in MRInputHelpers. I would like > to ensure these signature are prevented from being modified. > - MRInputHelpers.generateInputSplitsToMem > - MRInputHelpers.parseMRInputPayload > - MRInputHelpers.createSplitProto > - MRInputHelpers.createOldFormatSplitFromUserPayload > - MRInputHelpers.configureMRInputWithLegacySplitGeneration > A recent fixed jira TEZ-3430 modified generateInputSplitsToMem > {code} > java.lang.NoSuchMethodError: > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(Lorg/apache/hadoop/conf/Configuration;ZI)Lorg/apache/tez/mapreduce/hadoop/InputSplitInfoMem; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3477) MRInputHelpers generateInputSplitsToMem public API modified
[ https://issues.apache.org/jira/browse/TEZ-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586551#comment-15586551 ] Hitesh Shah commented on TEZ-3477: -- Might be good to add backward compatible functions to account for the changes brought in by TEZ-3430 as part of this jira too. > MRInputHelpers generateInputSplitsToMem public API modified > --- > > Key: TEZ-3477 > URL: https://issues.apache.org/jira/browse/TEZ-3477 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > > Pig and Hive directly rely on specific APIs in MRInputHelpers. I would like > to ensure these signature are prevented from being modified. > - MRInputHelpers.generateInputSplitsToMem > - MRInputHelpers.parseMRInputPayload > - MRInputHelpers.createSplitProto > - MRInputHelpers.createOldFormatSplitFromUserPayload > - MRInputHelpers.configureMRInputWithLegacySplitGeneration > A recent fixed jira TEZ-3430 modified generateInputSplitsToMem > {code} > java.lang.NoSuchMethodError: > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(Lorg/apache/hadoop/conf/Configuration;ZI)Lorg/apache/tez/mapreduce/hadoop/InputSplitInfoMem; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3477) MRInputHelpers generateInputSplitsToMem public API modified
Jonathan Eagles created TEZ-3477: Summary: MRInputHelpers generateInputSplitsToMem public API modified Key: TEZ-3477 URL: https://issues.apache.org/jira/browse/TEZ-3477 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Pig and Hive directly rely on specific APIs in MRInputHelpers. I would like to ensure these signature are prevented from being modified. - MRInputHelpers.generateInputSplitsToMem - MRInputHelpers.parseMRInputPayload - MRInputHelpers.createSplitProto - MRInputHelpers.createOldFormatSplitFromUserPayload - MRInputHelpers.configureMRInputWithLegacySplitGeneration A recent fixed jira TEZ-3430 modified generateInputSplitsToMem {code} java.lang.NoSuchMethodError: org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(Lorg/apache/hadoop/conf/Configuration;ZI)Lorg/apache/tez/mapreduce/hadoop/InputSplitInfoMem; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3476) Need a way to account for container localization.
Eric Payne created TEZ-3476: --- Summary: Need a way to account for container localization. Key: TEZ-3476 URL: https://issues.apache.org/jira/browse/TEZ-3476 Project: Apache Tez Issue Type: Bug Reporter: Eric Payne Tez task attempt start times don't reflect time spent in localization. In the MapReduce framework, the time spent in localization was included in the total runtime of each task attempt. But since Tez reuses containers, the time spent localizing for a container is not captured. The start time of the first attempt in that container will only be set after the localization has completed. The result is that attempts can appear as if they are not being run even though there are resources available in the queue. An attempt can be assigned to a container, but if the container is on a slow node and it takes a long time to localize, the attempt state will remain pending until localization completes. The impact risk is that tasks will not speculate during localization since they haven't started -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3458) Auto grouping for cartesian product edge(unpartitioned case)
[ https://issues.apache.org/jira/browse/TEZ-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586045#comment-15586045 ] TezQA commented on TEZ-3458: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12833984/TEZ-3458.2.patch against master revision 04d609e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2044//console This message is automatically generated. > Auto grouping for cartesian product edge(unpartitioned case) > > > Key: TEZ-3458 > URL: https://issues.apache.org/jira/browse/TEZ-3458 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3458.1.patch, TEZ-3458.2.patch > > > Original CartesianProductVertexManagerUnpartitioned set parallelism as > product of all source vertices parallelism which may explode to insane > number. We should do auto reduce as in ShuffleVertexManager to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-3458 PreCommit Build #2044
Jira: https://issues.apache.org/jira/browse/TEZ-3458 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2044/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 102 lines...] patching file tez-runtime-library/src/test/java/org/apache/tez/runtime/library/cartesianproduct/TestCartesianProductVertexManagerConfig.java patching file tez-runtime-library/src/test/java/org/apache/tez/runtime/library/cartesianproduct/TestCartesianProductVertexManagerPartitioned.java patching file tez-runtime-library/src/test/java/org/apache/tez/runtime/library/cartesianproduct/TestCartesianProductVertexManagerUnpartitioned.java patching file tez-runtime-library/src/test/java/org/apache/tez/runtime/library/cartesianproduct/TestGrouper.java == == Determining number of patched javac warnings. == == /home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests -Ptest-patch > /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/patchJavacWarnings.txt 2>&1 {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12833984/TEZ-3458.2.patch against master revision 04d609e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2044//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 293210c5cd443cd3bb1bbff53167dec689e34bbb logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Compressed 786.92 KB of artifacts by 44.7% relative to #2043 [description-setter] Could not determine description. Recording test results ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error? Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Updated] (TEZ-3458) Auto grouping for cartesian product edge(unpartitioned case)
[ https://issues.apache.org/jira/browse/TEZ-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3458: -- Attachment: TEZ-3458.2.patch > Auto grouping for cartesian product edge(unpartitioned case) > > > Key: TEZ-3458 > URL: https://issues.apache.org/jira/browse/TEZ-3458 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3458.1.patch, TEZ-3458.2.patch > > > Original CartesianProductVertexManagerUnpartitioned set parallelism as > product of all source vertices parallelism which may explode to insane > number. We should do auto reduce as in ShuffleVertexManager to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3452) Auto-reduce parallelism calculation can overflow with large inputs
[ https://issues.apache.org/jira/browse/TEZ-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585906#comment-15585906 ] Ming Ma commented on TEZ-3452: -- +1. Thanks [~jeagles]. > Auto-reduce parallelism calculation can overflow with large inputs > -- > > Key: TEZ-3452 > URL: https://issues.apache.org/jira/browse/TEZ-3452 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-3452.1.patch, TEZ-3452.2.patch, TEZ-3452.3.patch > > > Overflow can occur when the numTasks is high (say 45000) and outputSize is > high (say 311TB) and slow start is set to 1.0. > {code:title=ShuffleVertexManager} > for (Map.EntryvInfo : getBipartiteInfo()) { > SourceVertexInfo srcInfo = vInfo.getValue(); > if (srcInfo.numTasks > 0 && srcInfo.numVMEventsReceived > 0) { > // this assumes that 1 vmEvent is received per completed task - > TEZ-2961 > expectedTotalSourceTasksOutputSize += > (srcInfo.numTasks * srcInfo.outputSize) / > srcInfo.numVMEventsReceived; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3475) Merge duplicated method into base class
[ https://issues.apache.org/jira/browse/TEZ-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585388#comment-15585388 ] ASF GitHub Bot commented on TEZ-3475: - GitHub user darionyaphet opened a pull request: https://github.com/apache/tez/pull/17 TEZ-3475 Merge duplicated method into base class Merge duplicated method (handleEvents and close) into MRTask.class [Merge duplicated method into base class](https://issues.apache.org/jira/browse/TEZ-3475) You can merge this pull request into a Git repository by running: $ git pull https://github.com/darionyaphet/tez TEZ-3475 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tez/pull/17.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17 commit a729ac669bfb7815ec43805025e5b9f0d7217608 Author: darionyaphetDate: 2016-10-18T12:52:31Z TEZ-3475 Merge duplicated method into base class > Merge duplicated method into base class > --- > > Key: TEZ-3475 > URL: https://issues.apache.org/jira/browse/TEZ-3475 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.4 >Reporter: darion yaphet >Assignee: darion yaphet > Fix For: 0.9.0, 0.8.5 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3475) Merge duplicated method into base class
darion yaphet created TEZ-3475: -- Summary: Merge duplicated method into base class Key: TEZ-3475 URL: https://issues.apache.org/jira/browse/TEZ-3475 Project: Apache Tez Issue Type: Bug Affects Versions: 0.8.4 Reporter: darion yaphet Assignee: darion yaphet Fix For: 0.9.0, 0.8.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3419) Tez UI: Applications page shows error, for users with only DAG level ACL permission.
[ https://issues.apache.org/jira/browse/TEZ-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585304#comment-15585304 ] TezQA commented on TEZ-3419: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12833926/TEZ-3419.6.patch against master revision 48208dc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2043//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2043//console This message is automatically generated. > Tez UI: Applications page shows error, for users with only DAG level ACL > permission. > > > Key: TEZ-3419 > URL: https://issues.apache.org/jira/browse/TEZ-3419 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.7.0 >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > Attachments: Screen Shot 2016-10-13 at 4.25.31 PM.png, Screen Shot > 2016-10-13 at 4.37.09 PM.png, Screen Shot 2016-10-17 at 4.11.29 PM.png, > Screen Shot 2016-10-17 at 4.11.59 PM.png, Screen Shot 2016-10-17 at 4.12.23 > PM.png, TEZ-3419.1.patch, TEZ-3419.2.patch, TEZ-3419.3.patch, > TEZ-3419.4.patch, TEZ-3419.5.patch, TEZ-3419.6.patch, TEZ-3419.wip.1.patch, > Tez data missing.png, YARN & Tez data missing.png, YARN data missing.png > > > Follow this logic and display better message: > On loading app details page, send a request to > /ws/v1/timeline/TEZ_APPLICATION/tez_ > - If it succeed, display the details page as we do now. > - If it fails, send a request to > /ws/v1/timeline/TEZ_DAG_ID?primaryFilter=applicationId%3A > -- If it succeed, then we know that DAGs under the app are available and > assume that the user doesn't have permission to access app level data. > --- If AHS is accessible, display application data from there in the details > page. > --- else if AHS is not accessible, display a message in app details tab, > something like "Data is not available. Check if you are authorized to access > application data!". > --- Also display the DAGs tab, for the user to see DAGs under that app. > -- If it fails, display error message as we do now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-3419 PreCommit Build #2043
Jira: https://issues.apache.org/jira/browse/TEZ-3419 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2043/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 4824 lines...] [INFO] Tez SUCCESS [ 0.037 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 58:36 min [INFO] Finished at: 2016-10-18T12:11:05+00:00 [INFO] Final Memory: 82M/1431M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12833926/TEZ-3419.6.patch against master revision 48208dc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2043//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2043//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. e2aadc29188148ed096682451b909c7b6a5188dd logged out == == Finished build. == == Archiving artifacts [description-setter] Description set: TEZ-3419 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (TEZ-3419) Tez UI: Applications page shows error, for users with only DAG level ACL permission.
[ https://issues.apache.org/jira/browse/TEZ-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram updated TEZ-3419: Attachment: TEZ-3419.6.patch Thanks [~hitesh] bq. 2 screenshots show spurious data being shown in the UI. Attaching a fresh patch with the correction. bq. 3rd screenshot is for the configs. Configs are not accessible due to permission issues but UI says no records found. I think this is a reasonable approach for now ( as compared to an error message indicating no data or permission issue ) but just wanted to make sure that this was the intention of the patch and not an accidental change. Thats true. The behavior is as expected. As of now, we are just bypassing a failure condition. > Tez UI: Applications page shows error, for users with only DAG level ACL > permission. > > > Key: TEZ-3419 > URL: https://issues.apache.org/jira/browse/TEZ-3419 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.7.0 >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > Attachments: Screen Shot 2016-10-13 at 4.25.31 PM.png, Screen Shot > 2016-10-13 at 4.37.09 PM.png, Screen Shot 2016-10-17 at 4.11.29 PM.png, > Screen Shot 2016-10-17 at 4.11.59 PM.png, Screen Shot 2016-10-17 at 4.12.23 > PM.png, TEZ-3419.1.patch, TEZ-3419.2.patch, TEZ-3419.3.patch, > TEZ-3419.4.patch, TEZ-3419.5.patch, TEZ-3419.6.patch, TEZ-3419.wip.1.patch, > Tez data missing.png, YARN & Tez data missing.png, YARN data missing.png > > > Follow this logic and display better message: > On loading app details page, send a request to > /ws/v1/timeline/TEZ_APPLICATION/tez_ > - If it succeed, display the details page as we do now. > - If it fails, send a request to > /ws/v1/timeline/TEZ_DAG_ID?primaryFilter=applicationId%3A > -- If it succeed, then we know that DAGs under the app are available and > assume that the user doesn't have permission to access app level data. > --- If AHS is accessible, display application data from there in the details > page. > --- else if AHS is not accessible, display a message in app details tab, > something like "Data is not available. Check if you are authorized to access > application data!". > --- Also display the DAGs tab, for the user to see DAGs under that app. > -- If it fails, display error message as we do now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3458) Auto grouping for cartesian product edge(unpartitioned case)
[ https://issues.apache.org/jira/browse/TEZ-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584716#comment-15584716 ] TezQA commented on TEZ-3458: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12833895/TEZ-3458.1.patch against master revision 48208dc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2042//console This message is automatically generated. > Auto grouping for cartesian product edge(unpartitioned case) > > > Key: TEZ-3458 > URL: https://issues.apache.org/jira/browse/TEZ-3458 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3458.1.patch > > > Original CartesianProductVertexManagerUnpartitioned set parallelism as > product of all source vertices parallelism which may explode to insane > number. We should do auto reduce as in ShuffleVertexManager to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-3458 PreCommit Build #2042
Jira: https://issues.apache.org/jira/browse/TEZ-3458 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2042/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 99 lines...] patching file tez-runtime-library/src/test/java/org/apache/tez/runtime/library/cartesianproduct/TestCartesianProductVertexManagerConfig.java patching file tez-runtime-library/src/test/java/org/apache/tez/runtime/library/cartesianproduct/TestCartesianProductVertexManagerPartitioned.java patching file tez-runtime-library/src/test/java/org/apache/tez/runtime/library/cartesianproduct/TestCartesianProductVertexManagerUnpartitioned.java patching file tez-runtime-library/src/test/java/org/apache/tez/runtime/library/cartesianproduct/TestGrouper.java == == Determining number of patched javac warnings. == == /home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests -Ptest-patch > /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/patchJavacWarnings.txt 2>&1 {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12833895/TEZ-3458.1.patch against master revision 48208dc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2042//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 4fafd91b10829a1b3e96f213eff30793966f2287 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Compressed 788.43 KB of artifacts by 40.6% relative to #2041 [description-setter] Could not determine description. Recording test results ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error? Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Updated] (TEZ-3458) Auto grouping for cartesian product edge(unpartitioned case)
[ https://issues.apache.org/jira/browse/TEZ-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3458: -- Attachment: TEZ-3458.1.patch > Auto grouping for cartesian product edge(unpartitioned case) > > > Key: TEZ-3458 > URL: https://issues.apache.org/jira/browse/TEZ-3458 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3458.1.patch > > > Original CartesianProductVertexManagerUnpartitioned set parallelism as > product of all source vertices parallelism which may explode to insane > number. We should do auto reduce as in ShuffleVertexManager to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3458) Auto grouping for cartesian product edge(unpartitioned case)
[ https://issues.apache.org/jira/browse/TEZ-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3458: -- Summary: Auto grouping for cartesian product edge(unpartitioned case) (was: Auto reduce for cartesian product edge(unpartitioned case)) > Auto grouping for cartesian product edge(unpartitioned case) > > > Key: TEZ-3458 > URL: https://issues.apache.org/jira/browse/TEZ-3458 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3458.1.patch > > > Original CartesianProductVertexManagerUnpartitioned set parallelism as > product of all source vertices parallelism which may explode to insane > number. We should do auto reduce as in ShuffleVertexManager to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)