[jira] [Commented] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
[ https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376355#comment-15376355 ] Mithun Radhakrishnan commented on TEZ-3336: --- Ok, here's what's happening: {{HiveSplitGenerator}} is only in play if Hive uses the {{HiveInputFormat}} when generating splits on the AM. It's not built to handle {{CombineHiveInputFormat}} at all. I suppose regrouping grouped splits is silly. If the user chooses {{CombineHiveInputFormat}}, then Hive's [{{DagUtils.createVertex()}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L612-L618] does the following: {code:java|title=DagUtils.java#L612-L618|borderStyle=solid} // Not HiveInputFormat, or a custom VertexManager will take care of grouping splits if (vertexHasCustomInput) { dataSource = MultiMRInput.createConfigBuilder(conf, inputFormatClass).groupSplits(false).build(); } else { dataSource = MRInputLegacy.createConfigBuilder(conf, inputFormatClass).groupSplits(false).build(); } {code} So Hive delegates to Tez's {{MRInputLegacy.createConfigBuilder()}}, which eventually puts {{MRInput}} and {{MRInputAMSplitGenerator}} in play. I'm still curious about the nature of the events sent to {{MRInputAMSplitGenerator}}, and who's sending them. That'll help convince me that this is indeed a Hive bug. :] > Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE > --- > > Key: TEZ-3336 > URL: https://issues.apache.org/jira/browse/TEZ-3336 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe > > When Hive does a map-side join it can generate a DAG where a vertex has two > inputs, one from an upstream task and another using MRInputAMSplitGenerator. > If it takes a while for MRInputAMSplitGenerator to compute the splits and one > of the tasks for the other upstream vertex completes then the job can fail > with an error since MRInputAMSplitGenerator does not expect to receive any > events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3343) sqoop import can't success
[ https://issues.apache.org/jira/browse/TEZ-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376235#comment-15376235 ] lishaoguang commented on TEZ-3343: -- Sorry.This is my first time to submit the issue to jira. Today I create the hive table with ' hive.execution.engine=tez ' ,but It doesn't work.The logs are as follows: 16/07/14 02:59:15 [main]: INFO SessionState: Map 1: -/- Status: Failed 16/07/14 02:59:15 [main]: ERROR SessionState: Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1468464343019_0002_1_00, diagnostics=[Vertex vertex_1468464343019_0002_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: values__tmp__table__1 initializer failed, vertex=vertex_1468464343019_0002_1_00 [Map 1], java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/MRVersion at org.apache.hadoop.hive.shims.Hadoop23Shims.isMR2(Hadoop23Shims.java:852) at org.apache.hadoop.hive.shims.Hadoop23Shims.getHadoopConfNames(Hadoop23Shims.java:923) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:358) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:371) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:296) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:106) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.MRVersion at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 17 more ] Can you help me? > sqoop import can't success > -- > > Key: TEZ-3343 > URL: https://issues.apache.org/jira/browse/TEZ-3343 > Project: Apache Tez > Issue Type: Bug > Environment: hadoop-2.6.0,sqoop-1.4.6,tez-0.8.4 >Reporter: lishaoguang > > I deployed the hadoop environment,and i tried import data from mysql to > hdfs,without tez.When I deployed the tez ,I tried the 'orderedwordcount' and > It success,but when I use sqoop to import data from mysql to hdfs ,It stop at > 0% map and failed at last.How can I do ?Can anyone help me? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat
[ https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376139#comment-15376139 ] Piyush Narang commented on TEZ-3348: Thanks for getting back Hitesh - Put out a PR - https://github.com/apache/tez/pull/11 > NullPointerException in Tez MROutput while trying to write using Parquet's > DeprecatedParquetOutputFormat > > > Key: TEZ-3348 > URL: https://issues.apache.org/jira/browse/TEZ-3348 > Project: Apache Tez > Issue Type: Bug >Reporter: Piyush Narang > > Trying to run some Tez MR jobs that write out some data using Parquet to > HDFS. When I try to do so, end up seeing a NPE in the Parquet code: > {code} > java.lang.NullPointerException > at org.apache.hadoop.fs.Path.(Path.java:105) > at org.apache.hadoop.fs.Path.(Path.java:94) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77) > at > org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416) > {code} > The flow seems to be: > 1) The Parquet deprecated output format class tries to read the > workOutputPath - > https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69 > 2) This calls FileOutputFormat.getWorkOutputPath(...) - > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229 > 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR > ("mapreduce.task.output.dir") constant. > 4) This ends up being null and in the Parquet code we end up with an NPE in > the Path class. > Looking at the Tez code, we are setting the workOutputPath in the > MROutput.initCommitter method - > https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445. > > This call however, is made after the call to access the workOutputPath as > part of outputFormat.getRecordWriter(). > I tried out a run where I moved this initCommitter call up: > {code} > else { > oldApiTaskAttemptContext = > new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl( > jobConf, taskAttemptId, > new MRTaskReporter(getContext())); > initCommitter(jobConf, useNewApi); // before the getRecordWriter call > oldOutputFormat = jobConf.getOutputFormat(); > outputFormatClassName = oldOutputFormat.getClass().getName(); > FileSystem fs = FileSystem.get(jobConf); > String finalName = getOutputName(); > oldRecordWriter = > oldOutputFormat.getRecordWriter( > fs, jobConf, finalName, new > MRReporter(getContext().getCounters())); > } > {code} > I tried out a run with this and it seems to succeed. If this sounds > reasonable, I can cut a PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat
[ https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376138#comment-15376138 ] ASF GitHub Bot commented on TEZ-3348: - GitHub user piyushnarang opened a pull request: https://github.com/apache/tez/pull/11 TEZ-3348: NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat Proposed fix for the reported jira. Added a couple of unit tests as well. Seems like if you use the new APIs, this isn't an issue (as they tend to read `FileOutputFormat.getDefaultWorkFile` which isn't checking the workOutputPath. In case of the old APIs though without this fix the unit test will fail. I added a unit test for the new API for completeness. You can merge this pull request into a Git repository by running: $ git pull https://github.com/piyushnarang/tez master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tez/pull/11.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11 commit 6f3e0f4f5718c01f247915f1b84e28c75b2dc83b Author: Piyush NarangDate: 2016-07-14T01:13:45Z Move initCommitter call up in MROutput > NullPointerException in Tez MROutput while trying to write using Parquet's > DeprecatedParquetOutputFormat > > > Key: TEZ-3348 > URL: https://issues.apache.org/jira/browse/TEZ-3348 > Project: Apache Tez > Issue Type: Bug >Reporter: Piyush Narang > > Trying to run some Tez MR jobs that write out some data using Parquet to > HDFS. When I try to do so, end up seeing a NPE in the Parquet code: > {code} > java.lang.NullPointerException > at org.apache.hadoop.fs.Path.(Path.java:105) > at org.apache.hadoop.fs.Path.(Path.java:94) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77) > at > org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416) > {code} > The flow seems to be: > 1) The Parquet deprecated output format class tries to read the > workOutputPath - > https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69 > 2) This calls FileOutputFormat.getWorkOutputPath(...) - > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229 > 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR > ("mapreduce.task.output.dir") constant. > 4) This ends up being null and in the Parquet code we end up with an NPE in > the Path class. > Looking at the Tez code, we are setting the workOutputPath in the > MROutput.initCommitter method - > https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445. > > This call however, is made after the call to access the workOutputPath as > part of outputFormat.getRecordWriter(). > I tried out a run where I moved this initCommitter call up: > {code} > else { > oldApiTaskAttemptContext = > new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl( > jobConf, taskAttemptId, > new MRTaskReporter(getContext())); > initCommitter(jobConf, useNewApi); // before the getRecordWriter call > oldOutputFormat = jobConf.getOutputFormat(); > outputFormatClassName = oldOutputFormat.getClass().getName(); > FileSystem fs = FileSystem.get(jobConf); > String finalName = getOutputName(); > oldRecordWriter = > oldOutputFormat.getRecordWriter( > fs, jobConf, finalName, new > MRReporter(getContext().getCounters())); > } > {code} > I tried out a run with this and it seems to succeed. If this sounds > reasonable, I can cut a PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat
[ https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376096#comment-15376096 ] Hitesh Shah commented on TEZ-3348: -- Thanks for reporting the issue [~pnarang] and yes, a PR or a patch attached to this JIRA sounds good. > NullPointerException in Tez MROutput while trying to write using Parquet's > DeprecatedParquetOutputFormat > > > Key: TEZ-3348 > URL: https://issues.apache.org/jira/browse/TEZ-3348 > Project: Apache Tez > Issue Type: Bug >Reporter: Piyush Narang > > Trying to run some Tez MR jobs that write out some data using Parquet to > HDFS. When I try to do so, end up seeing a NPE in the Parquet code: > {code} > java.lang.NullPointerException > at org.apache.hadoop.fs.Path.(Path.java:105) > at org.apache.hadoop.fs.Path.(Path.java:94) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89) > at > org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77) > at > org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416) > {code} > The flow seems to be: > 1) The Parquet deprecated output format class tries to read the > workOutputPath - > https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69 > 2) This calls FileOutputFormat.getWorkOutputPath(...) - > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229 > 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR > ("mapreduce.task.output.dir") constant. > 4) This ends up being null and in the Parquet code we end up with an NPE in > the Path class. > Looking at the Tez code, we are setting the workOutputPath in the > MROutput.initCommitter method - > https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445. > > This call however, is made after the call to access the workOutputPath as > part of outputFormat.getRecordWriter(). > I tried out a run where I moved this initCommitter call up: > {code} > else { > oldApiTaskAttemptContext = > new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl( > jobConf, taskAttemptId, > new MRTaskReporter(getContext())); > initCommitter(jobConf, useNewApi); // before the getRecordWriter call > oldOutputFormat = jobConf.getOutputFormat(); > outputFormatClassName = oldOutputFormat.getClass().getName(); > FileSystem fs = FileSystem.get(jobConf); > String finalName = getOutputName(); > oldRecordWriter = > oldOutputFormat.getRecordWriter( > fs, jobConf, finalName, new > MRReporter(getContext().getCounters())); > } > {code} > I tried out a run with this and it seems to succeed. If this sounds > reasonable, I can cut a PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat
Piyush Narang created TEZ-3348: -- Summary: NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat Key: TEZ-3348 URL: https://issues.apache.org/jira/browse/TEZ-3348 Project: Apache Tez Issue Type: Bug Reporter: Piyush Narang Trying to run some Tez MR jobs that write out some data using Parquet to HDFS. When I try to do so, end up seeing a NPE in the Parquet code: {code} java.lang.NullPointerException at org.apache.hadoop.fs.Path.(Path.java:105) at org.apache.hadoop.fs.Path.(Path.java:94) at org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69) at org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36) at org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89) at org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77) at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416) {code} The flow seems to be: 1) The Parquet deprecated output format class tries to read the workOutputPath - https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69 2) This calls FileOutputFormat.getWorkOutputPath(...) - https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR ("mapreduce.task.output.dir") constant. 4) This ends up being null and in the Parquet code we end up with an NPE in the Path class. Looking at the Tez code, we are setting the workOutputPath in the MROutput.initCommitter method - https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445. This call however, is made after the call to access the workOutputPath as part of outputFormat.getRecordWriter(). I tried out a run where I moved this initCommitter call up: {code} else { oldApiTaskAttemptContext = new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl( jobConf, taskAttemptId, new MRTaskReporter(getContext())); initCommitter(jobConf, useNewApi); // before the getRecordWriter call oldOutputFormat = jobConf.getOutputFormat(); outputFormatClassName = oldOutputFormat.getClass().getName(); FileSystem fs = FileSystem.get(jobConf); String finalName = getOutputName(); oldRecordWriter = oldOutputFormat.getRecordWriter( fs, jobConf, finalName, new MRReporter(getContext().getCounters())); } {code} I tried out a run with this and it seems to succeed. If this sounds reasonable, I can cut a PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-3330 PreCommit Build #1850
Jira: https://issues.apache.org/jira/browse/TEZ-3330 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1850/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 4132 lines...] [INFO] [INFO] Total time: 59:57 min [INFO] Finished at: 2016-07-13T22:56:54+00:00 [INFO] Final Memory: 68M/1005M [INFO] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12817802/TEZ-3330.temp.patch against master revision 55f5186. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1850//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1850//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. d504a0d76ae2e692354da59fe0700a0a38096c6b logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3340) Add support for YARN Shared Cache
[ https://issues.apache.org/jira/browse/TEZ-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375802#comment-15375802 ] Siddharth Seth commented on TEZ-3340: - Tez does not upload the jars on it's own. However, this is something that we want to change (not sure if a jira exists yet). Having to manually upload the jar to HDFS is an avoidable step; it does get rid of multiple copies of the same jar all over the dist cache, but there's other approaches to avoiding that. > Add support for YARN Shared Cache > - > > Key: TEZ-3340 > URL: https://issues.apache.org/jira/browse/TEZ-3340 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma > > YARN provides shared cache in functionality YARN-1492. According to > [~ctrezzo] most of the YARN functionality is in hadoop 2.8 and frameworks can > start to use it. MR adds the support via MAPREDUCE-5951. > Can anyone confirm if Tez supports the upload of application DAG jar and > dependent lib jars from client machine to HDFS as part of Tez app submission? > From my test, that doesn't seem to happen. Instead Tez expects applications > to upload the jars to HDFS beforehand and then set the tez.aux.uris to the > HDFS locations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-3235) Modify Example TestOrderedWordCount job to test the IPC limit for large dag plans
[ https://issues.apache.org/jira/browse/TEZ-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375780#comment-15375780 ] Hitesh Shah edited comment on TEZ-3235 at 7/13/16 9:30 PM: --- Thanks for the contribution [~ssreenivasan] and for the reviews [~aplusplus]. Committed to master branch 0.8. was (Author: hitesh): Thanks for the contribution [~ssreenivasan] and for the reviews [~aplusplus]. Committed to master. > Modify Example TestOrderedWordCount job to test the IPC limit for large dag > plans > - > > Key: TEZ-3235 > URL: https://issues.apache.org/jira/browse/TEZ-3235 > Project: Apache Tez > Issue Type: Task >Reporter: Sushmitha Sreenivasan >Assignee: Sushmitha Sreenivasan > Fix For: 0.9.0, 0.8.5 > > Attachments: TEZ-3235.1.patch, Tez-3235.2.patch, Tez-3235.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3235) Modify Example TestOrderedWordCount job to test the IPC limit for large dag plans
[ https://issues.apache.org/jira/browse/TEZ-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-3235: - Fix Version/s: 0.8.5 0.9.0 > Modify Example TestOrderedWordCount job to test the IPC limit for large dag > plans > - > > Key: TEZ-3235 > URL: https://issues.apache.org/jira/browse/TEZ-3235 > Project: Apache Tez > Issue Type: Task >Reporter: Sushmitha Sreenivasan >Assignee: Sushmitha Sreenivasan > Fix For: 0.9.0, 0.8.5 > > Attachments: TEZ-3235.1.patch, Tez-3235.2.patch, Tez-3235.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-3330: Attachment: TEZ-3330.temp.patch Patch is not tested at all. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > Attachments: TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375782#comment-15375782 ] Siddharth Seth commented on TEZ-3330: - I don't think there's any way to do this at the moment. Attaching a temporary patch for this. Don't think fixing this properly is trivial; well we could just skip the ConfigBuilders altogether. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > Attachments: TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3235) Modify Example TestOrderedWordCount job to test the IPC limit for large dag plans
[ https://issues.apache.org/jira/browse/TEZ-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375778#comment-15375778 ] Hitesh Shah commented on TEZ-3235: -- +1 for now. I think TestOrderedWordCount needs a large refactor to clean up the current codebase. Committing shortly. > Modify Example TestOrderedWordCount job to test the IPC limit for large dag > plans > - > > Key: TEZ-3235 > URL: https://issues.apache.org/jira/browse/TEZ-3235 > Project: Apache Tez > Issue Type: Task >Reporter: Sushmitha Sreenivasan >Assignee: Sushmitha Sreenivasan > Attachments: TEZ-3235.1.patch, Tez-3235.2.patch, Tez-3235.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3235) Modify Example TestOrderedWordCount job to test the IPC limit for large dag plans
[ https://issues.apache.org/jira/browse/TEZ-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-3235: - Affects Version/s: (was: 0.8.3) > Modify Example TestOrderedWordCount job to test the IPC limit for large dag > plans > - > > Key: TEZ-3235 > URL: https://issues.apache.org/jira/browse/TEZ-3235 > Project: Apache Tez > Issue Type: Task >Reporter: Sushmitha Sreenivasan >Assignee: Sushmitha Sreenivasan > Attachments: TEZ-3235.1.patch, Tez-3235.2.patch, Tez-3235.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3331) Add operation specific HDFS counters for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-3331: Assignee: Hitesh Shah > Add operation specific HDFS counters for Tez UI > --- > > Key: TEZ-3331 > URL: https://issues.apache.org/jira/browse/TEZ-3331 > Project: Apache Tez > Issue Type: Bug >Reporter: Jitendra Nath Pandey >Assignee: Hitesh Shah > Attachments: TEZ-3331.wip.2.patch, TEZ-3331.wip.3.patch, > TEZ-3331.wip.4.patch, TEZ-3331.wip.5.patch, TEZ-3331.wip.patch > > > Hadoop has added several operation specific counters in the FileSystem > statistics (HADOOP-13065). These counters are useful to track file system > operations more granularly. It would be great to track these counters for Tez > and expose them via UI as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
[ https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375691#comment-15375691 ] Siddharth Seth commented on TEZ-3336: - [~jlowe] - InputInitializer events and VMEvents short circuit the DAG definition to some extent. Pretty much any task can send these events to any Vertex since there's no Control Plane definition to restrict this. The Hive processor is supposed to target these events at specific Vertices which know how to handle them. In fact I'm not sure Hive uses MRInputSplitGeneartor at all anymore. It has it's own SplitGenerator which is based on MRInputSplitGeneartor - and knows how to handle these events for partition pruning. This sounds like a Hive bug to me. > Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE > --- > > Key: TEZ-3336 > URL: https://issues.apache.org/jira/browse/TEZ-3336 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe > > When Hive does a map-side join it can generate a DAG where a vertex has two > inputs, one from an upstream task and another using MRInputAMSplitGenerator. > If it takes a while for MRInputAMSplitGenerator to compute the splits and one > of the tasks for the other upstream vertex completes then the job can fail > with an error since MRInputAMSplitGenerator does not expect to receive any > events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3274) Vertex with MRInput and shuffle input does not respect slow start
[ https://issues.apache.org/jira/browse/TEZ-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375682#comment-15375682 ] Siddharth Seth commented on TEZ-3274: - Haven't looked at this in sometime. Is this being used with MRInputSplitDistributor, and the initial parallelism set on the specific vertex. I don't think using a Root Input along with a ShuffleInput on the same vertex will work with MRInputAMSplitGenerator since parallelism is setup at runtime. Shuffle tasks will see a value of -1 if the initialization takes time. I believe we never really focused on this case, and if it showed up - it would need to be handled via a custom VertexManager. If such a manager were to exist - how would the data distribution be handled? There's different splits for the MRInput and partitions on the Shuffle side - how are they mapped? > Vertex with MRInput and shuffle input does not respect slow start > - > > Key: TEZ-3274 > URL: https://issues.apache.org/jira/browse/TEZ-3274 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles > > Vertices with shuffle input and MRInput choose RootInputVertexManager (and > not ShuffleVertexManager) and start containers and tasks immediately. In this > scenario, resources can be wasted since they do not respect > tez.shuffle-vertex-manager.min-src-fraction > tez.shuffle-vertex-manager.max-src-fraction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3337) Do not log empty fields of TaskAttemptFinishedEvent to avoid confusion
[ https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-3337: - Summary: Do not log empty fields of TaskAttemptFinishedEvent to avoid confusion (was: Not log empty fields of TaskAttemptFinishedEvent to avoid confusion) > Do not log empty fields of TaskAttemptFinishedEvent to avoid confusion > -- > > Key: TEZ-3337 > URL: https://issues.apache.org/jira/browse/TEZ-3337 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3337.1.patch > > > For successful task attempt, we don't record the containerId, which cause > "containerId=," in the INFO logs. We should avoid logging this field if it's > empty. > {code} > 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, > creationTime=1467956979891, allocationTime=1467956980426, > startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, > creationTime=1467956979894, allocationTime=1467956980427, > startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
[ https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375429#comment-15375429 ] Hitesh Shah commented on TEZ-3337: -- +1. Committing shortly. > Not log empty fields of TaskAttemptFinishedEvent to avoid confusion > --- > > Key: TEZ-3337 > URL: https://issues.apache.org/jira/browse/TEZ-3337 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3337.1.patch > > > For successful task attempt, we don't record the containerId, which cause > "containerId=," in the INFO logs. We should avoid logging this field if it's > empty. > {code} > 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, > creationTime=1467956979891, allocationTime=1467956980426, > startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] > |history.HistoryEventHandler|: > [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, > creationTime=1467956979894, allocationTime=1467956980427, > startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, > status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, > nodeHttpAddress= > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3331) Add operation specific HDFS counters for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-3331: - Attachment: TEZ-3331.wip.5.patch Fix typo in test file. > Add operation specific HDFS counters for Tez UI > --- > > Key: TEZ-3331 > URL: https://issues.apache.org/jira/browse/TEZ-3331 > Project: Apache Tez > Issue Type: Bug >Reporter: Jitendra Nath Pandey > Attachments: TEZ-3331.wip.2.patch, TEZ-3331.wip.3.patch, > TEZ-3331.wip.4.patch, TEZ-3331.wip.5.patch, TEZ-3331.wip.patch > > > Hadoop has added several operation specific counters in the FileSystem > statistics (HADOOP-13065). These counters are useful to track file system > operations more granularly. It would be great to track these counters for Tez > and expose them via UI as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3344) ATS does not contain certain vertex details
[ https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375316#comment-15375316 ] Hitesh Shah commented on TEZ-3344: -- What version of hadoop are you using? Also, how has timeline been configured? v1? v1.5? Do you have the yarn app logs for this? > ATS does not contain certain vertex details > --- > > Key: TEZ-3344 > URL: https://issues.apache.org/jira/browse/TEZ-3344 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: all_vertices_page_showing_only_3_vertices.png, dag.json, > dag_details.png, dag_succeeded_screenshot.png, vertex.json > > > e.g hive query > {noformat} > select count(1) from (select d_date, t_time_id from (select d_date from > date_dim sort by d_date) d, time_dim) x > {noformat} > This has 4 vertices (will attach the DAG shortly). However, when ats data is > downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" > data is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3347) Vertex UI throws an error while getting vertexProgress for a killed Vertex
[ https://issues.apache.org/jira/browse/TEZ-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3347: - Attachment: ErrorCodeFailedVertex.png > Vertex UI throws an error while getting vertexProgress for a killed Vertex > -- > > Key: TEZ-3347 > URL: https://issues.apache.org/jira/browse/TEZ-3347 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Kuhu Shukla > Attachments: ErrorCodeFailedVertex.png > > > Given an AM that fails all its attempts, the application fails and the very > first click on the killed/failed vertex throws the following error: > {code} > error code: Unknown, message: expected expression, got '<' > {code} > It self corrects if tried again immediately after the failure. > This is because the RM proxy redirects the call to the AHS server and the > REST call is malformed for that server. Upon inspection of the responses, it > was seen that the URL looked something like this: > {code} > http://:/applicationhistory/app/application_123_456/ws/v1/tez/vertexProgress?dagID=1=01&_=123 > {code} > which is not a proper Rest call on the AHS. > I think the following code can cause this issue: > {code} > // Load progress in parallel for v1 version of the api > _loadProgress: function (vertices) { > var that = this, > runningVerticesIdx = vertices > .filterBy('status', 'RUNNING') > .map(function(item) { > return item.get('id').split('_').splice(-1).pop(); > }); > if (runningVerticesIdx.length > 0) { > this.store.unloadAll('vertexProgress'); > this.store.findQuery('vertexProgress', { > metadata: { > appId: that.get('applicationId'), > dagIdx: that.get('idx'), > vertexIds: runningVerticesIdx.join(',') > } > }).then(function(vertexProgressInfo) { > App.Helpers.emData.mergeRecords( > that.get('rowsDisplayed'), > vertexProgressInfo, > ['progress'] > ); > }).catch(function(error) { > error.message = "Failed to fetch vertexProgress. Application Master > (AM) is out of reach. Either it's down, or CORS is not enabled for YARN > ResourceManager."; > Em.Logger.error(error); > var err = App.Helpers.misc.formatError(error); > var msg = 'Error code: %@, message: %@'.fmt(err.errCode, err.msg); > App.Helpers.ErrorBar.getInstance().show(msg, err.details); > }); > {code} > which uses AMInfo that gets the response based on what loadApp method finds: > {code} > loadApp: function (store, appId, useCache) { > if(!useCache) { > App.Helpers.misc.removeRecord(store, 'appDetail', appId); > App.Helpers.misc.removeRecord(store, 'clusterApp', appId); > } > return store.find('clusterApp', appId).catch(function () { > return store.find('appDetail', appId); > }).catch(function (error) { > error.message = "Couldn't get details of application %@. RM is not > reachable, and history service is not enabled.".fmt(appId); > throw error; > }); > } > {code} > We can check here in the catch block if the response type is not JSON or not > try and get vertexProgress since it knows that the application/AM has failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3347) Vertex UI throws an error while getting vertexProgress for a killed Vertex
Kuhu Shukla created TEZ-3347: Summary: Vertex UI throws an error while getting vertexProgress for a killed Vertex Key: TEZ-3347 URL: https://issues.apache.org/jira/browse/TEZ-3347 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Kuhu Shukla Attachments: ErrorCodeFailedVertex.png Given an AM that fails all its attempts, the application fails and the very first click on the killed/failed vertex throws the following error: {code} error code: Unknown, message: expected expression, got '<' {code} It self corrects if tried again immediately after the failure. This is because the RM proxy redirects the call to the AHS server and the REST call is malformed for that server. Upon inspection of the responses, it was seen that the URL looked something like this: {code} http://:/applicationhistory/app/application_123_456/ws/v1/tez/vertexProgress?dagID=1=01&_=123 {code} which is not a proper Rest call on the AHS. I think the following code can cause this issue: {code} // Load progress in parallel for v1 version of the api _loadProgress: function (vertices) { var that = this, runningVerticesIdx = vertices .filterBy('status', 'RUNNING') .map(function(item) { return item.get('id').split('_').splice(-1).pop(); }); if (runningVerticesIdx.length > 0) { this.store.unloadAll('vertexProgress'); this.store.findQuery('vertexProgress', { metadata: { appId: that.get('applicationId'), dagIdx: that.get('idx'), vertexIds: runningVerticesIdx.join(',') } }).then(function(vertexProgressInfo) { App.Helpers.emData.mergeRecords( that.get('rowsDisplayed'), vertexProgressInfo, ['progress'] ); }).catch(function(error) { error.message = "Failed to fetch vertexProgress. Application Master (AM) is out of reach. Either it's down, or CORS is not enabled for YARN ResourceManager."; Em.Logger.error(error); var err = App.Helpers.misc.formatError(error); var msg = 'Error code: %@, message: %@'.fmt(err.errCode, err.msg); App.Helpers.ErrorBar.getInstance().show(msg, err.details); }); {code} which uses AMInfo that gets the response based on what loadApp method finds: {code} loadApp: function (store, appId, useCache) { if(!useCache) { App.Helpers.misc.removeRecord(store, 'appDetail', appId); App.Helpers.misc.removeRecord(store, 'clusterApp', appId); } return store.find('clusterApp', appId).catch(function () { return store.find('appDetail', appId); }).catch(function (error) { error.message = "Couldn't get details of application %@. RM is not reachable, and history service is not enabled.".fmt(appId); throw error; }); } {code} We can check here in the catch block if the response type is not JSON or not try and get vertexProgress since it knows that the application/AM has failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3346) Vertex status on DAG UI and Vertex UI can be inconsistent
[ https://issues.apache.org/jira/browse/TEZ-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3346: - Attachment: VertexPage.png DagPage_AMDiagnostics.png > Vertex status on DAG UI and Vertex UI can be inconsistent > - > > Key: TEZ-3346 > URL: https://issues.apache.org/jira/browse/TEZ-3346 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Kuhu Shukla > Attachments: DagPage_AMDiagnostics.png, VertexPage.png > > > If the AM fails on all its attempts, the Vertex status on the DAG page is > "KILLED" . On clicking the vertexName, the vertex UI page shows the status as > "FAILED" (which it pulls from the proxy). > I think this code in {{vertex_tasks_controller.js}} makes the decision to > mark the vertex as KILLED, since the Timeline Server has "RUNNING" as the > last updated vertex status since the AM crashed. > {code} > if (taskStatus == 'RUNNING' && isUnsuccessfulVertex) { > taskStatus = 'KILLED' > } > if (taskStatus != task.get('status')) { > task.set('status', taskStatus); > } > }); > {code} > Not sure which of the 2 values is semantically correct. The FAILED state > seems to come from this logic which uses appState and finalState of the > application from Timeline's and AHS's perspective to decide the vertex state. > {code} > vertex.set('status', App.Helpers.misc.getRealStatus(vertex.get('status'), > appDetail.get('status'), > appDetail.get('finalStatus'))); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3346) Vertex status on DAG UI and Vertex UI can be inconsistent
Kuhu Shukla created TEZ-3346: Summary: Vertex status on DAG UI and Vertex UI can be inconsistent Key: TEZ-3346 URL: https://issues.apache.org/jira/browse/TEZ-3346 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Kuhu Shukla If the AM fails on all its attempts, the Vertex status on the DAG page is "KILLED" . On clicking the vertexName, the vertex UI page shows the status as "FAILED" (which it pulls from the proxy). I think this code in {{vertex_tasks_controller.js}} makes the decision to mark the vertex as KILLED, since the Timeline Server has "RUNNING" as the last updated vertex status since the AM crashed. {code} if (taskStatus == 'RUNNING' && isUnsuccessfulVertex) { taskStatus = 'KILLED' } if (taskStatus != task.get('status')) { task.set('status', taskStatus); } }); {code} Not sure which of the 2 values is semantically correct. The FAILED state seems to come from this logic which uses appState and finalState of the application from Timeline's and AHS's perspective to decide the vertex state. {code} vertex.set('status', App.Helpers.misc.getRealStatus(vertex.get('status'), appDetail.get('status'), appDetail.get('finalStatus'))); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3345) Diagnostics for a failed AM may not show up on the DAG UI page.
[ https://issues.apache.org/jira/browse/TEZ-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3345: - Attachment: DagPage_AMDiagnostics.png > Diagnostics for a failed AM may not show up on the DAG UI page. > --- > > Key: TEZ-3345 > URL: https://issues.apache.org/jira/browse/TEZ-3345 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Kuhu Shukla > Attachments: DagPage_AMDiagnostics.png > > > In a scenario where AM fails on all its attempts, the DAG page does not show > the Diagnostics message on the Tez UI "Diagnostics" section. The messageis > available on the AHS and could be pulled into the DAG page just like we do > for failed vertices. > I ran a simple Tez example job that was given way too less memory causing the > AM to fail: > {code} > hadoop jar /home/gs/tez/current/tez-examples-0.7.1.x.x.jar orderedwordcount > -Dtez.am.resource.memory.mb=16 -Dtez.am.launch.cmd-opts="-Xmx13m > -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -Xloggc:/gc.log"/books /output > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3345) Diagnostics for a failed AM may not show up on the DAG UI page.
Kuhu Shukla created TEZ-3345: Summary: Diagnostics for a failed AM may not show up on the DAG UI page. Key: TEZ-3345 URL: https://issues.apache.org/jira/browse/TEZ-3345 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Kuhu Shukla In a scenario where AM fails on all its attempts, the DAG page does not show the Diagnostics message on the Tez UI "Diagnostics" section. The messageis available on the AHS and could be pulled into the DAG page just like we do for failed vertices. I ran a simple Tez example job that was given way too less memory causing the AM to fail: {code} hadoop jar /home/gs/tez/current/tez-examples-0.7.1.x.x.jar orderedwordcount -Dtez.am.resource.memory.mb=16 -Dtez.am.launch.cmd-opts="-Xmx13m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/gc.log"/books /output {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3344) ATS does not contain certain vertex details
[ https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3344: -- Attachment: dag_succeeded_screenshot.png > ATS does not contain certain vertex details > --- > > Key: TEZ-3344 > URL: https://issues.apache.org/jira/browse/TEZ-3344 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: all_vertices_page_showing_only_3_vertices.png, dag.json, > dag_details.png, dag_succeeded_screenshot.png, vertex.json > > > e.g hive query > {noformat} > select count(1) from (select d_date, t_time_id from (select d_date from > date_dim sort by d_date) d, time_dim) x > {noformat} > This has 4 vertices (will attach the DAG shortly). However, when ats data is > downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" > data is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3344) ATS does not contain certain vertex details
[ https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374836#comment-15374836 ] Rajesh Balamohan commented on TEZ-3344: --- for the missing vertex, data is not in ATS. http://ats_machine/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1466700718395_0351_4_03 {noformat} { exception: "NotFoundException", message: "java.lang.Exception: Timeline entity { id: vertex_1466700718395_0351_4_03, type: TEZ_VERTEX_ID } is not found", javaClassName: "org.apache.hadoop.yarn.webapp.NotFoundException" } {noformat} > ATS does not contain certain vertex details > --- > > Key: TEZ-3344 > URL: https://issues.apache.org/jira/browse/TEZ-3344 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: all_vertices_page_showing_only_3_vertices.png, dag.json, > dag_details.png, vertex.json > > > e.g hive query > {noformat} > select count(1) from (select d_date, t_time_id from (select d_date from > date_dim sort by d_date) d, time_dim) x > {noformat} > This has 4 vertices (will attach the DAG shortly). However, when ats data is > downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" > data is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3344) ATS does not contain certain vertex details
[ https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3344: -- Attachment: dag.json vertex.json \cc [~Sreenath] > ATS does not contain certain vertex details > --- > > Key: TEZ-3344 > URL: https://issues.apache.org/jira/browse/TEZ-3344 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: all_vertices_page_showing_only_3_vertices.png, dag.json, > dag_details.png, vertex.json > > > e.g hive query > {noformat} > select count(1) from (select d_date, t_time_id from (select d_date from > date_dim sort by d_date) d, time_dim) x > {noformat} > This has 4 vertices (will attach the DAG shortly). However, when ats data is > downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" > data is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3344) ATS does not contain certain vertex details
[ https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3344: -- Attachment: all_vertices_page_showing_only_3_vertices.png > ATS does not contain certain vertex details > --- > > Key: TEZ-3344 > URL: https://issues.apache.org/jira/browse/TEZ-3344 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: all_vertices_page_showing_only_3_vertices.png, > dag_details.png > > > e.g hive query > {noformat} > select count(1) from (select d_date, t_time_id from (select d_date from > date_dim sort by d_date) d, time_dim) x > {noformat} > This has 4 vertices (will attach the DAG shortly). However, when ats data is > downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" > data is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3344) ATS does not contain certain vertex details
[ https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3344: -- Attachment: dag_details.png > ATS does not contain certain vertex details > --- > > Key: TEZ-3344 > URL: https://issues.apache.org/jira/browse/TEZ-3344 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: dag_details.png > > > e.g hive query > {noformat} > select count(1) from (select d_date, t_time_id from (select d_date from > date_dim sort by d_date) d, time_dim) x > {noformat} > This has 4 vertices (will attach the DAG shortly). However, when ats data is > downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" > data is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3344) ATS does not contain certain vertex details
Rajesh Balamohan created TEZ-3344: - Summary: ATS does not contain certain vertex details Key: TEZ-3344 URL: https://issues.apache.org/jira/browse/TEZ-3344 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan e.g hive query {noformat} select count(1) from (select d_date, t_time_id from (select d_date from date_dim sort by d_date) d, time_dim) x {noformat} This has 4 vertices (will attach the DAG shortly). However, when ats data is downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" data is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)