[jira] [Updated] (TEZ-3235) Modify Example TestOrderedWordCount job to test the IPC limit for large dag plans
[ https://issues.apache.org/jira/browse/TEZ-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushmitha Sreenivasan updated TEZ-3235: --- Attachment: Tez-3235.3.patch > Modify Example TestOrderedWordCount job to test the IPC limit for large dag > plans > - > > Key: TEZ-3235 > URL: https://issues.apache.org/jira/browse/TEZ-3235 > Project: Apache Tez > Issue Type: Task >Affects Versions: 0.8.3 >Reporter: Sushmitha Sreenivasan >Assignee: Sushmitha Sreenivasan > Attachments: TEZ-3235.1.patch, Tez-3235.2.patch, Tez-3235.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats
[ https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374379#comment-15374379 ] Tsuyoshi Ozawa commented on TEZ-3303: - Thanks for your review and your committing :-) > Have ShuffleVertexManager consume more precise partition stats > -- > > Key: TEZ-3303 > URL: https://issues.apache.org/jira/browse/TEZ-3303 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Fix For: 0.9.0 > > Attachments: TEZ-3303.001.patch, TEZ-3303.002.patch, > TEZ-3303.002.patch, TEZ-3303.003.02.patch, TEZ-3303.003.patch > > > TEZ-3216 adds the support for more precise partition stats. > ShuffleVertexManager should be updated to consume the more precise partition > stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3343) sqoop import can't success
[ https://issues.apache.org/jira/browse/TEZ-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374251#comment-15374251 ] Jeff Zhang commented on TEZ-3343: - Could you attach the yarn app log ? You can ask this kind of question in tez user mail list before confirming this is a bug. > sqoop import can't success > -- > > Key: TEZ-3343 > URL: https://issues.apache.org/jira/browse/TEZ-3343 > Project: Apache Tez > Issue Type: Bug > Environment: hadoop-2.6.0,sqoop-1.4.6,tez-0.8.4 >Reporter: lishaoguang > > I deployed the hadoop environment,and i tried import data from mysql to > hdfs,without tez.When I deployed the tez ,I tried the 'orderedwordcount' and > It success,but when I use sqoop to import data from mysql to hdfs ,It stop at > 0% map and failed at last.How can I do ?Can anyone help me? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3343) sqoop import can't success
lishaoguang created TEZ-3343: Summary: sqoop import can't success Key: TEZ-3343 URL: https://issues.apache.org/jira/browse/TEZ-3343 Project: Apache Tez Issue Type: Bug Environment: hadoop-2.6.0,sqoop-1.4.6,tez-0.8.4 Reporter: lishaoguang I deployed the hadoop environment,and i tried import data from mysql to hdfs,without tez.When I deployed the tez ,I tried the 'orderedwordcount' and It success,but when I use sqoop to import data from mysql to hdfs ,It stop at 0% map and failed at last.How can I do ?Can anyone help me? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-3303 PreCommit Build #1848
Jira: https://issues.apache.org/jira/browse/TEZ-3303 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1848/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 4123 lines...] [INFO] Tez ... SUCCESS [ 0.035 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 55:31 min [INFO] Finished at: 2016-07-13T01:17:13+00:00 [INFO] Final Memory: 86M/1053M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12817556/TEZ-3303.003.02.patch against master revision 8131896. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1848//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1848//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. f2b72f48dd84ca2ad2ede90b8b9dc9d19e49bf70 logged out == == Finished build. == == Archiving artifacts [description-setter] Description set: TEZ-3303 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats
[ https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374133#comment-15374133 ] TezQA commented on TEZ-3303: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12817556/TEZ-3303.003.02.patch against master revision 8131896. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1848//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1848//console This message is automatically generated. > Have ShuffleVertexManager consume more precise partition stats > -- > > Key: TEZ-3303 > URL: https://issues.apache.org/jira/browse/TEZ-3303 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-3303.001.patch, TEZ-3303.002.patch, > TEZ-3303.002.patch, TEZ-3303.003.02.patch, TEZ-3303.003.patch > > > TEZ-3216 adds the support for more precise partition stats. > ShuffleVertexManager should be updated to consume the more precise partition > stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374071#comment-15374071 ] Bikas Saha commented on TEZ-3334: - Also reporting errors properly in the response such that 1 error does not corrupt the entire data stream. YARN-1773. > Tez Custom Shuffle Handler > -- > > Key: TEZ-3334 > URL: https://issues.apache.org/jira/browse/TEZ-3334 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles > > For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom > shuffle handler could help reduce the number of fetches and could more > efficiently fetch data. In particular if a reducer is fetching 100 pieces > serially from the same mapper it could do this in one fetch call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374065#comment-15374065 ] Bikas Saha commented on TEZ-3334: - YARN-4577 for classpath isolation of aux services. Perhaps the first thing could be the POC. Which is take existing MR shuffle and change its packaging to org.apache.tez. Then add it as tez_shuffle in YARN alongside mapreduce_shuffle. And verify that tez jobs use Tez shuffle and MR jobs use MR shuffle (both shuffle services running the same code effectively). After that we can create follow up jiras for new features and improvements to tez shuffle. Sounds like a plan? > Tez Custom Shuffle Handler > -- > > Key: TEZ-3334 > URL: https://issues.apache.org/jira/browse/TEZ-3334 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles > > For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom > shuffle handler could help reduce the number of fetches and could more > efficiently fetch data. In particular if a reducer is fetching 100 pieces > serially from the same mapper it could do this in one fetch call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-3337 PreCommit Build #1847
Jira: https://issues.apache.org/jira/browse/TEZ-3337 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1847/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 4121 lines...] [INFO] Tez ... SUCCESS [ 0.030 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 54:01 min [INFO] Finished at: 2016-07-13T00:06:09+00:00 [INFO] Final Memory: 85M/1211M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12817540/TEZ-3337.1.patch against master revision 8131896. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1847//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1847//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. f05a669123d2097020adaf99dd531d88b36de504 logged out == == Finished build. == == Archiving artifacts [description-setter] Description set: TEZ-3337 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
[ https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374021#comment-15374021 ] TezQA commented on TEZ-3337: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12817540/TEZ-3337.1.patch against master revision 8131896. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1847//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1847//console This message is automatically generated. > Not log empty fields of TaskAttemptFinishedEvent to avoid confusion > --- > > Key: TEZ-3337 > URL: https://issues.apache.org/jira/browse/TEZ-3337 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3337.1.patch > > > For successful task attempt, we don't record the containerId, which cause > "containerId=," in the INFO logs. We should avoid logging this field if it's > empty. > {code} > 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, > creationTime=1467956979891, allocationTime=1467956980426, > startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, > creationTime=1467956979894, allocationTime=1467956980427, > startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3331) Add operation specific HDFS counters for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373981#comment-15373981 ] Hitesh Shah commented on TEZ-3331: -- Also, TEZ-3168 has a wip patch that shows how the shims could be enhanced to make use of an API not in the default version of hadoop that we compile against. > Add operation specific HDFS counters for Tez UI > --- > > Key: TEZ-3331 > URL: https://issues.apache.org/jira/browse/TEZ-3331 > Project: Apache Tez > Issue Type: Bug >Reporter: Jitendra Nath Pandey > Attachments: TEZ-3331.wip.2.patch, TEZ-3331.wip.3.patch, > TEZ-3331.wip.4.patch, TEZ-3331.wip.patch > > > Hadoop has added several operation specific counters in the FileSystem > statistics (HADOOP-13065). These counters are useful to track file system > operations more granularly. It would be great to track these counters for Tez > and expose them via UI as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats
[ https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-3303: Attachment: TEZ-3303.003.02.patch Uploading the modified patch for precommit. > Have ShuffleVertexManager consume more precise partition stats > -- > > Key: TEZ-3303 > URL: https://issues.apache.org/jira/browse/TEZ-3303 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-3303.001.patch, TEZ-3303.002.patch, > TEZ-3303.002.patch, TEZ-3303.003.02.patch, TEZ-3303.003.patch > > > TEZ-3216 adds the support for more precise partition stats. > ShuffleVertexManager should be updated to consume the more precise partition > stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats
[ https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373969#comment-15373969 ] Siddharth Seth commented on TEZ-3303: - Very minor: can we make this an "else if (proto.hasDetailedPartitionStats)". One of the two stats is populated; however this should not double count if both were populated. Thanks [~ozawa] for the patch and [~mingma] for the review. Will commit after this change. > Have ShuffleVertexManager consume more precise partition stats > -- > > Key: TEZ-3303 > URL: https://issues.apache.org/jira/browse/TEZ-3303 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-3303.001.patch, TEZ-3303.002.patch, > TEZ-3303.002.patch, TEZ-3303.003.patch > > > TEZ-3216 adds the support for more precise partition stats. > ShuffleVertexManager should be updated to consume the more precise partition > stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3331) Add operation specific HDFS counters for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373947#comment-15373947 ] Ming Ma commented on TEZ-3331: -- [~hitesh] thanks for the info about hadoop shim. bq. Mind adding more details on which features in particular? I have opened TEZ-3340, TEZ-3341, TEZ-3342 and followed up on [~sseth]'s email thread about release. Do you know if hadoop shim can supports additions of these features? > Add operation specific HDFS counters for Tez UI > --- > > Key: TEZ-3331 > URL: https://issues.apache.org/jira/browse/TEZ-3331 > Project: Apache Tez > Issue Type: Bug >Reporter: Jitendra Nath Pandey > Attachments: TEZ-3331.wip.2.patch, TEZ-3331.wip.3.patch, > TEZ-3331.wip.4.patch, TEZ-3331.wip.patch > > > Hadoop has added several operation specific counters in the FileSystem > statistics (HADOOP-13065). These counters are useful to track file system > operations more granularly. It would be great to track these counters for Tez > and expose them via UI as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3340) Add support for YARN Shared Cache
[ https://issues.apache.org/jira/browse/TEZ-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated TEZ-3340: - Description: YARN provides shared cache in functionality YARN-1492. According to [~ctrezzo] most of the YARN functionality is in hadoop 2.8 and frameworks can start to use it. MR adds the support via MAPREDUCE-5951. Can anyone confirm if Tez supports the upload of application DAG jar and dependent lib jars from client machine to HDFS as part of Tez app submission? From my test, that doesn't seem to happen. Instead Tez expects applications to upload the jars to HDFS beforehand and then set the tez.aux.uris to the HDFS locations. was: YARN provides shared cache in functionality YARN-1492. According to [~ctrezzo] most of the YARN functionality is in hadoop 2.8 and frameworks can start to use it. MR adds the support via MAPREDUCE-5951. Can anyone confirm if Tez supports the upload of application DAG jar and dependent lib jars from client machine to HDFS as part of Tez app submission? From my test, that doesn't seem to happen. Tez expects applications to upload the jars to HDFS beforehand and then set the tez.aux.uris to the HDFS locations. > Add support for YARN Shared Cache > - > > Key: TEZ-3340 > URL: https://issues.apache.org/jira/browse/TEZ-3340 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma > > YARN provides shared cache in functionality YARN-1492. According to > [~ctrezzo] most of the YARN functionality is in hadoop 2.8 and frameworks can > start to use it. MR adds the support via MAPREDUCE-5951. > Can anyone confirm if Tez supports the upload of application DAG jar and > dependent lib jars from client machine to HDFS as part of Tez app submission? > From my test, that doesn't seem to happen. Instead Tez expects applications > to upload the jars to HDFS beforehand and then set the tez.aux.uris to the > HDFS locations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3342) Have Tez AM generate thread dump on task attempts timeout before killing them
Ming Ma created TEZ-3342: Summary: Have Tez AM generate thread dump on task attempts timeout before killing them Key: TEZ-3342 URL: https://issues.apache.org/jira/browse/TEZ-3342 Project: Apache Tez Issue Type: Improvement Reporter: Ming Ma This is to provide something similar to MAPREDUCE-5044. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (TEZ-3340) Add support for YARN Shared Cache
[ https://issues.apache.org/jira/browse/TEZ-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma moved YARN-5365 to TEZ-3340: Key: TEZ-3340 (was: YARN-5365) Project: Apache Tez (was: Hadoop YARN) > Add support for YARN Shared Cache > - > > Key: TEZ-3340 > URL: https://issues.apache.org/jira/browse/TEZ-3340 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma > > YARN provides shared cache in functionality YARN-1492. According to > [~ctrezzo] most of the YARN functionality is in hadoop 2.8 and frameworks can > start to use it. MR adds the support via MAPREDUCE-5951. > Can anyone confirm if Tez supports the upload of application DAG jar and > dependent lib jars from client machine to HDFS as part of Tez app submission? > From my test, that doesn't seem to happen. Tez expects applications to upload > the jars to HDFS beforehand and then set the tez.aux.uris to the HDFS > locations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
[ https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373841#comment-15373841 ] Zhiyuan Yang edited comment on TEZ-3337 at 7/12/16 10:43 PM: - [~hitesh], [~gopalv], [~jeagles], Please help review. was (Author: aplusplus): [~hitesh], [~gopalv], Please help review. > Not log empty fields of TaskAttemptFinishedEvent to avoid confusion > --- > > Key: TEZ-3337 > URL: https://issues.apache.org/jira/browse/TEZ-3337 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3337.1.patch > > > For successful task attempt, we don't record the containerId, which cause > "containerId=," in the INFO logs. We should avoid logging this field if it's > empty. > {code} > 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, > creationTime=1467956979891, allocationTime=1467956980426, > startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, > creationTime=1467956979894, allocationTime=1467956980427, > startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
[ https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3337: -- Attachment: TEZ-3337.1.patch > Not log empty fields of TaskAttemptFinishedEvent to avoid confusion > --- > > Key: TEZ-3337 > URL: https://issues.apache.org/jira/browse/TEZ-3337 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3337.1.patch > > > For successful task attempt, we don't record the containerId, which cause > "containerId=," in the INFO logs. We should avoid logging this field if it's > empty. > {code} > 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, > creationTime=1467956979891, allocationTime=1467956980426, > startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, > creationTime=1467956979894, allocationTime=1467956980427, > startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3339) Add Tez Counters for bytes-read-by-network-distance FileSystem metrics
Ming Ma created TEZ-3339: Summary: Add Tez Counters for bytes-read-by-network-distance FileSystem metrics Key: TEZ-3339 URL: https://issues.apache.org/jira/browse/TEZ-3339 Project: Apache Tez Issue Type: Improvement Reporter: Ming Ma This is the Tez part of the change which is to consume bytes-read-by-network-distance metrics generated by HDFS-9579, like what we want to have in MAPREDUCE-6660. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
[ https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373787#comment-15373787 ] Zhiyuan Yang commented on TEZ-3337: --- This issue doesn't apply for ATS related things. When we convert TaskAttemptFinishedEvent to either TimelineEntity or JSONObject, we skip the null fields. > Not log empty fields of TaskAttemptFinishedEvent to avoid confusion > --- > > Key: TEZ-3337 > URL: https://issues.apache.org/jira/browse/TEZ-3337 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > > For successful task attempt, we don't record the containerId, which cause > "containerId=," in the INFO logs. We should avoid logging this field if it's > empty. > {code} > 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, > creationTime=1467956979891, allocationTime=1467956980426, > startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, > creationTime=1467956979894, allocationTime=1467956980427, > startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3338) Support classloader isolation
Ming Ma created TEZ-3338: Summary: Support classloader isolation Key: TEZ-3338 URL: https://issues.apache.org/jira/browse/TEZ-3338 Project: Apache Tez Issue Type: Improvement Reporter: Ming Ma HADOOP-10893 and MAPREDUCE-1700 provide classloader isolation at both client side and container side for MR. We should add the same support for Tez. Given we use hadoop command to launch Tez, it appears the client side has been taken care of. Only the container side support is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
[ https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373781#comment-15373781 ] Hitesh Shah edited comment on TEZ-3337 at 7/12/16 9:58 PM: --- If this is the case, -the conversion to ATS should also not set the value if it is empty or null- can you confirm that we dont reset the value to empty for ATS? was (Author: hitesh): If this is the case, the conversion to ATS should also not set the value if it is empty or null. > Not log empty fields of TaskAttemptFinishedEvent to avoid confusion > --- > > Key: TEZ-3337 > URL: https://issues.apache.org/jira/browse/TEZ-3337 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > > For successful task attempt, we don't record the containerId, which cause > "containerId=," in the INFO logs. We should avoid logging this field if it's > empty. > {code} > 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, > creationTime=1467956979891, allocationTime=1467956980426, > startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, > creationTime=1467956979894, allocationTime=1467956980427, > startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
[ https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373781#comment-15373781 ] Hitesh Shah commented on TEZ-3337: -- If this is the case, the conversion to ATS should also not set the value if it is empty or null. > Not log empty fields of TaskAttemptFinishedEvent to avoid confusion > --- > > Key: TEZ-3337 > URL: https://issues.apache.org/jira/browse/TEZ-3337 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > > For successful task attempt, we don't record the containerId, which cause > "containerId=," in the INFO logs. We should avoid logging this field if it's > empty. > {code} > 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, > creationTime=1467956979891, allocationTime=1467956980426, > startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, > creationTime=1467956979894, allocationTime=1467956980427, > startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3331) Add operation specific HDFS counters for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-3331: - Attachment: TEZ-3331.wip.4.patch Some tests added. > Add operation specific HDFS counters for Tez UI > --- > > Key: TEZ-3331 > URL: https://issues.apache.org/jira/browse/TEZ-3331 > Project: Apache Tez > Issue Type: Bug >Reporter: Jitendra Nath Pandey > Attachments: TEZ-3331.wip.2.patch, TEZ-3331.wip.3.patch, > TEZ-3331.wip.4.patch, TEZ-3331.wip.patch > > > Hadoop has added several operation specific counters in the FileSystem > statistics (HADOOP-13065). These counters are useful to track file system > operations more granularly. It would be great to track these counters for Tez > and expose them via UI as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
Zhiyuan Yang created TEZ-3337: - Summary: Not log empty fields of TaskAttemptFinishedEvent to avoid confusion Key: TEZ-3337 URL: https://issues.apache.org/jira/browse/TEZ-3337 Project: Apache Tez Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang For successful task attempt, we don't record the containerId, which cause "containerId=," in the INFO logs. We should avoid logging this field if it's empty. {code} 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] |history.HistoryEventHandler|: [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, creationTime=1467956979891, allocationTime=1467956980426, startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, nodeHttpAddress= 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] |history.HistoryEventHandler|: [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, creationTime=1467956979894, allocationTime=1467956980427, startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, nodeHttpAddress= {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373702#comment-15373702 ] Ming Ma commented on TEZ-3334: -- [~bikassaha], for the new YARN aux service isolation, do you mean YARN-1593? > Tez Custom Shuffle Handler > -- > > Key: TEZ-3334 > URL: https://issues.apache.org/jira/browse/TEZ-3334 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles > > For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom > shuffle handler could help reduce the number of fetches and could more > efficiently fetch data. In particular if a reducer is fetching 100 pieces > serially from the same mapper it could do this in one fetch call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
[ https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373690#comment-15373690 ] Jason Lowe commented on TEZ-3336: - Seems like one fix would be to simply have the MR input initializers ignore events rather than explode. I'm guessing those initializers do not care at all about what anything else is doing -- they just want to compute splits based purely on the MR input. > Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE > --- > > Key: TEZ-3336 > URL: https://issues.apache.org/jira/browse/TEZ-3336 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe > > When Hive does a map-side join it can generate a DAG where a vertex has two > inputs, one from an upstream task and another using MRInputAMSplitGenerator. > If it takes a while for MRInputAMSplitGenerator to compute the splits and one > of the tasks for the other upstream vertex completes then the job can fail > with an error since MRInputAMSplitGenerator does not expect to receive any > events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
[ https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373680#comment-15373680 ] Jason Lowe commented on TEZ-3336: - One example of the failure: {noformat} Vertex failed, vertexName=Map 1, vertexId=vertex_1467094199147_3081640_1_01, diagnostics=[Vertex vertex_1467094199147_3081640_1_01 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: input initializer failed, vertex=vertex_1467094199147_3081640_1_01 [Map 1], java.lang.UnsupportedOperationException: Not expecting to handle any events at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.handleInputInitializerEvent(MRInputAMSplitGenerator.java:170) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InitializerWrapper.sendEvents(RootInputInitializerManager.java:501) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InitializerWrapper.onTaskSucceeded(RootInputInitializerManager.java:451) at org.apache.tez.dag.app.dag.StateChangeNotifier.taskSucceeded(StateChangeNotifier.java:290) at org.apache.tez.dag.app.dag.impl.TaskImpl$TaskStateChangedCallback.onStateChanged(TaskImpl.java:1524) at org.apache.tez.dag.app.dag.impl.TaskImpl$TaskStateChangedCallback.onStateChanged(TaskImpl.java:1508) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:61) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:918) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:112) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:2068) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:2054) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) at java.lang.Thread.run(Thread.java:745) ] {noformat} RootInputInitializerManager delegates the input initializers to a thread pool and listens for vertex/task events while those initializers are running. Once they complete it unregisters from those events. If the initializer completes before an upstream task succeeds we're OK, but if a task succeeds first it ends up sending events to the initializer which doesn't expect any events. Looks like MRInputSplitDistributor could have the same issue, and a fix for TEZ-3274 would aggravate the issue further. > Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE > --- > > Key: TEZ-3336 > URL: https://issues.apache.org/jira/browse/TEZ-3336 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe > > When Hive does a map-side join it can generate a DAG where a vertex has two > inputs, one from an upstream task and another using MRInputAMSplitGenerator. > If it takes a while for MRInputAMSplitGenerator to compute the splits and one > of the tasks for the other upstream vertex completes then the job can fail > with an error since MRInputAMSplitGenerator does not expect to receive any > events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
[ https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373670#comment-15373670 ] Hitesh Shah commented on TEZ-3336: -- \cc [~hagleitn] > Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE > --- > > Key: TEZ-3336 > URL: https://issues.apache.org/jira/browse/TEZ-3336 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe > > When Hive does a map-side join it can generate a DAG where a vertex has two > inputs, one from an upstream task and another using MRInputAMSplitGenerator. > If it takes a while for MRInputAMSplitGenerator to compute the splits and one > of the tasks for the other upstream vertex completes then the job can fail > with an error since MRInputAMSplitGenerator does not expect to receive any > events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
Jason Lowe created TEZ-3336: --- Summary: Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE Key: TEZ-3336 URL: https://issues.apache.org/jira/browse/TEZ-3336 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.1 Reporter: Jason Lowe When Hive does a map-side join it can generate a DAG where a vertex has two inputs, one from an upstream task and another using MRInputAMSplitGenerator. If it takes a while for MRInputAMSplitGenerator to compute the splits and one of the tasks for the other upstream vertex completes then the job can fail with an error since MRInputAMSplitGenerator does not expect to receive any events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373610#comment-15373610 ] Hitesh Shah commented on TEZ-3334: -- Other feature asks: - control channel to query the shuffle service about various bits of info - potential stats on cache hits, failures, aborted fetches, etc - support for deleting data - in case of Tez, intermediate data across a long running session will need cleaning up. - can we do something better from a disk usage/quote perspective? What happens if one app takes over too much disk space? Guess that falls under yarn local dirs and not really shuffle but worth thinking about? > Tez Custom Shuffle Handler > -- > > Key: TEZ-3334 > URL: https://issues.apache.org/jira/browse/TEZ-3334 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles > > For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom > shuffle handler could help reduce the number of fetches and could more > efficiently fetch data. In particular if a reducer is fetching 100 pieces > serially from the same mapper it could do this in one fetch call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3335) DAG client thinks app is still running when app status is null
[ https://issues.apache.org/jira/browse/TEZ-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373411#comment-15373411 ] Jason Lowe commented on TEZ-3335: - I thought about fixing this on the YARN side. The YarnClient currently auto-redirects to the AHS when the RM doesn't know about an app. It could detect that the AHS report doesn't contain a status, so therefore the app is essentially lost at that point. The RM doesnt' know about it, and the AHS never got a completion event for it. However I didn't want the AHS client to throw an exception for that case since the app report does contain _some_ useful information about the lost app, such as user, queue, start time, app name, etc. Throwing an exception means the user gets no details about the app, so returning what we do know seemed more prudent. The problem with the AHS or client trying to fix this on the YARN side is that we don't know what the final status of the application was. It could be any of FAILED, KILLED, or SUCCEEDED if the completion event tried to get posted to the AHS but was dropped for some reason. Therefore it seems a bit dangerous to assume one of those three. We could always add a new status like LOST or UNKNOWN, etc., but of course that requires app frameworks to update themselves to detect and react properly to the new state. > DAG client thinks app is still running when app status is null > -- > > Key: TEZ-3335 > URL: https://issues.apache.org/jira/browse/TEZ-3335 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe > > When an RM restarts without recovering apps (i.e.: either work-preserving is > not enabled or state store was removed) and the YARN application history is > enabled then YarnClient can return an application report with the app status > as null. The RM doesn't know about the application, so the client redirects > to the AHS. The AHS knows the app started at some point but will never > received a finished event, hence the null app status. > The DAG client fails to detect this scenario and believes the app is still > running, so for example Hive clients will continue to hammer for status on an > app that doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3335) DAG client thinks app is still running when app status is null
[ https://issues.apache.org/jira/browse/TEZ-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373352#comment-15373352 ] Hitesh Shah commented on TEZ-3335: -- \cc [~gtCarrera9] [~vinodkv] > DAG client thinks app is still running when app status is null > -- > > Key: TEZ-3335 > URL: https://issues.apache.org/jira/browse/TEZ-3335 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe > > When an RM restarts without recovering apps (i.e.: either work-preserving is > not enabled or state store was removed) and the YARN application history is > enabled then YarnClient can return an application report with the app status > as null. The RM doesn't know about the application, so the client redirects > to the AHS. The AHS knows the app started at some point but will never > received a finished event, hence the null app status. > The DAG client fails to detect this scenario and believes the app is still > running, so for example Hive clients will continue to hammer for status on an > app that doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-3335) DAG client thinks app is still running when app status is null
[ https://issues.apache.org/jira/browse/TEZ-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373330#comment-15373330 ] Hitesh Shah edited comment on TEZ-3335 at 7/12/16 5:58 PM: --- Seems like a bug in YARN that should be fixed too? Where if the RM does not know about it, it means app has completed with final state/status unknown and therefore either the RM or AHS should inject some state denoting completion? was (Author: hitesh): Seems like a bug in YARN that should be fixed too? Where if the RM does not know about it, it means app has completed with final state/status unknown? > DAG client thinks app is still running when app status is null > -- > > Key: TEZ-3335 > URL: https://issues.apache.org/jira/browse/TEZ-3335 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe > > When an RM restarts without recovering apps (i.e.: either work-preserving is > not enabled or state store was removed) and the YARN application history is > enabled then YarnClient can return an application report with the app status > as null. The RM doesn't know about the application, so the client redirects > to the AHS. The AHS knows the app started at some point but will never > received a finished event, hence the null app status. > The DAG client fails to detect this scenario and believes the app is still > running, so for example Hive clients will continue to hammer for status on an > app that doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3335) DAG client thinks app is still running when app status is null
[ https://issues.apache.org/jira/browse/TEZ-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373330#comment-15373330 ] Hitesh Shah commented on TEZ-3335: -- Seems like a bug in YARN that should be fixed too? Where if the RM does not know about it, it means app has completed with final state/status unknown? > DAG client thinks app is still running when app status is null > -- > > Key: TEZ-3335 > URL: https://issues.apache.org/jira/browse/TEZ-3335 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe > > When an RM restarts without recovering apps (i.e.: either work-preserving is > not enabled or state store was removed) and the YARN application history is > enabled then YarnClient can return an application report with the app status > as null. The RM doesn't know about the application, so the client redirects > to the AHS. The AHS knows the app started at some point but will never > received a finished event, hence the null app status. > The DAG client fails to detect this scenario and believes the app is still > running, so for example Hive clients will continue to hammer for status on an > app that doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3335) DAG client thinks app is still running when app status is null
Jason Lowe created TEZ-3335: --- Summary: DAG client thinks app is still running when app status is null Key: TEZ-3335 URL: https://issues.apache.org/jira/browse/TEZ-3335 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.1 Reporter: Jason Lowe When an RM restarts without recovering apps (i.e.: either work-preserving is not enabled or state store was removed) and the YARN application history is enabled then YarnClient can return an application report with the app status as null. The RM doesn't know about the application, so the client redirects to the AHS. The AHS knows the app started at some point but will never received a finished event, hence the null app status. The DAG client fails to detect this scenario and believes the app is still running, so for example Hive clients will continue to hammer for status on an app that doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372798#comment-15372798 ] Manuel Godbert commented on TEZ-3330: - I already tried that actually, with no success: the configuration property becomes available during shuffle but its value is the constant value of the tez-site.xml, not the value dynamically built at job setup. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3334) Tez Custom Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated TEZ-3334: Issue Type: New Feature (was: Bug) > Tez Custom Shuffle Handler > -- > > Key: TEZ-3334 > URL: https://issues.apache.org/jira/browse/TEZ-3334 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles > > For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom > shuffle handler could help reduce the number of fetches and could more > efficiently fetch data. In particular if a reducer is fetching 100 pieces > serially from the same mapper it could do this in one fetch call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3334) Tez Custom Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372216#comment-15372216 ] Rajesh Balamohan commented on TEZ-3334: --- +1 for custom shuffle handler. >> "fetching 100 pieces serially from the same mapper " If keep-alive connections are enabled in tez and in NM, would this be to mainly reduce the number of round trips?. > Tez Custom Shuffle Handler > -- > > Key: TEZ-3334 > URL: https://issues.apache.org/jira/browse/TEZ-3334 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles > > For conditions where auto-parallelism is reduced (e.g. TEZ-3222), a custom > shuffle handler could help reduce the number of fetches and could more > efficiently fetch data. In particular if a reducer is fetching 100 pieces > serially from the same mapper it could do this in one fetch call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)