[jira] [Commented] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE

2016-07-13 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376355#comment-15376355
 ] 

Mithun Radhakrishnan commented on TEZ-3336:
---

Ok, here's what's happening:

{{HiveSplitGenerator}} is only in play if Hive uses the {{HiveInputFormat}} 
when generating splits on the AM. It's not built to handle 
{{CombineHiveInputFormat}} at all. I suppose regrouping grouped splits is 
silly. 
If the user chooses {{CombineHiveInputFormat}}, then Hive's 
[{{DagUtils.createVertex()}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L612-L618]
 does the following:

{code:java|title=DagUtils.java#L612-L618|borderStyle=solid}
// Not HiveInputFormat, or a custom VertexManager will take care of 
grouping splits
if (vertexHasCustomInput) {
  dataSource =
  MultiMRInput.createConfigBuilder(conf, 
inputFormatClass).groupSplits(false).build();
} else {
  dataSource =
  MRInputLegacy.createConfigBuilder(conf, 
inputFormatClass).groupSplits(false).build();
}
{code}

So Hive delegates to Tez's {{MRInputLegacy.createConfigBuilder()}}, which 
eventually puts {{MRInput}} and {{MRInputAMSplitGenerator}} in play. 
I'm still curious about the nature of the events sent to 
{{MRInputAMSplitGenerator}}, and who's sending them. That'll help convince me 
that this is indeed a Hive bug. :]

> Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
> ---
>
> Key: TEZ-3336
> URL: https://issues.apache.org/jira/browse/TEZ-3336
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>
> When Hive does a map-side join it can generate a DAG where a vertex has two 
> inputs, one from an upstream task and another using MRInputAMSplitGenerator.  
> If it takes a while for MRInputAMSplitGenerator to compute the splits and one 
> of the tasks for the other upstream vertex completes then the job can fail 
> with an error since MRInputAMSplitGenerator does not expect to receive any 
> events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3343) sqoop import can't success

2016-07-13 Thread lishaoguang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376235#comment-15376235
 ] 

lishaoguang commented on TEZ-3343:
--

Sorry.This is my first time to submit the issue to jira.
Today I create the hive table with ' hive.execution.engine=tez ' ,but It 
doesn't work.The logs are as follows:

16/07/14 02:59:15 [main]: INFO SessionState: Map 1: -/-
Status: Failed
16/07/14 02:59:15 [main]: ERROR SessionState: Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1468464343019_0002_1_00, 
diagnostics=[Vertex vertex_1468464343019_0002_1_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: values__tmp__table__1 initializer 
failed, vertex=vertex_1468464343019_0002_1_00 [Map 1], 
java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/MRVersion
at 
org.apache.hadoop.hive.shims.Hadoop23Shims.isMR2(Hadoop23Shims.java:852)
at 
org.apache.hadoop.hive.shims.Hadoop23Shims.getHadoopConfNames(Hadoop23Shims.java:923)
at 
org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:358)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:371)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:296)
at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:106)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.MRVersion
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 17 more
]
Can you help me?

> sqoop import can't success
> --
>
> Key: TEZ-3343
> URL: https://issues.apache.org/jira/browse/TEZ-3343
> Project: Apache Tez
>  Issue Type: Bug
> Environment: hadoop-2.6.0,sqoop-1.4.6,tez-0.8.4
>Reporter: lishaoguang
>
> I deployed the hadoop environment,and i tried import data from mysql to 
> hdfs,without tez.When I deployed the tez ,I tried the 'orderedwordcount' and 
> It success,but when I use sqoop to import data from mysql to hdfs ,It stop at 
> 0% map and failed at last.How can I do ?Can anyone help me?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-13 Thread Piyush Narang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376139#comment-15376139
 ] 

Piyush Narang commented on TEZ-3348:


Thanks for getting back Hitesh - Put out a PR - 
https://github.com/apache/tez/pull/11

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376138#comment-15376138
 ] 

ASF GitHub Bot commented on TEZ-3348:
-

GitHub user piyushnarang opened a pull request:

https://github.com/apache/tez/pull/11

TEZ-3348: NullPointerException in Tez MROutput while trying to write using 
Parquet's DeprecatedParquetOutputFormat

Proposed fix for the reported jira. Added a couple of unit tests as well. 
Seems like if you use the new APIs, this isn't an issue (as they tend to read 
`FileOutputFormat.getDefaultWorkFile` which isn't checking the workOutputPath. 
In case of the old APIs though without this fix the unit test will fail. 
I added a unit test for the new API for completeness. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/piyushnarang/tez master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tez/pull/11.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11


commit 6f3e0f4f5718c01f247915f1b84e28c75b2dc83b
Author: Piyush Narang 
Date:   2016-07-14T01:13:45Z

Move initCommitter call up in MROutput




> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376096#comment-15376096
 ] 

Hitesh Shah commented on TEZ-3348:
--

Thanks for reporting the issue [~pnarang] and yes, a PR or a patch attached to 
this JIRA sounds good. 

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-13 Thread Piyush Narang (JIRA)
Piyush Narang created TEZ-3348:
--

 Summary: NullPointerException in Tez MROutput while trying to 
write using Parquet's DeprecatedParquetOutputFormat
 Key: TEZ-3348
 URL: https://issues.apache.org/jira/browse/TEZ-3348
 Project: Apache Tez
  Issue Type: Bug
Reporter: Piyush Narang


Trying to run some Tez MR jobs that write out some data using Parquet to HDFS. 
When I try to do so, end up seeing a NPE in the Parquet code:
{code}
java.lang.NullPointerException
at org.apache.hadoop.fs.Path.(Path.java:105)
at org.apache.hadoop.fs.Path.(Path.java:94)
at 
org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
at 
org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
at 
org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
at 
org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
at 
org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
{code}

The flow seems to be:
1) The Parquet deprecated output format class tries to read the workOutputPath 
- 
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
2) This calls FileOutputFormat.getWorkOutputPath(...) - 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
("mapreduce.task.output.dir") constant. 
4) This ends up being null and in the Parquet code we end up with an NPE in the 
Path class. 

Looking at the Tez code, we are setting the workOutputPath in the 
MROutput.initCommitter method - 
https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
 

This call however, is made after the call to access the workOutputPath as part 
of outputFormat.getRecordWriter(). 

I tried out a run where I moved this initCommitter call up:
{code}
else {
  oldApiTaskAttemptContext =
  new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
  jobConf, taskAttemptId,
  new MRTaskReporter(getContext()));
  initCommitter(jobConf, useNewApi); // before the getRecordWriter call

  oldOutputFormat = jobConf.getOutputFormat();
  outputFormatClassName = oldOutputFormat.getClass().getName();

  FileSystem fs = FileSystem.get(jobConf);
  String finalName = getOutputName();

  oldRecordWriter =
  oldOutputFormat.getRecordWriter(
  fs, jobConf, finalName, new 
MRReporter(getContext().getCounters()));
}
{code}

I tried out a run with this and it seems to succeed. If this sounds reasonable, 
I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-3330 PreCommit Build #1850

2016-07-13 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3330
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1850/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4132 lines...]
[INFO] 
[INFO] Total time: 59:57 min
[INFO] Finished at: 2016-07-13T22:56:54+00:00
[INFO] Final Memory: 68M/1005M
[INFO] 




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12817802/TEZ-3330.temp.patch
  against master revision 55f5186.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1850//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1850//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
d504a0d76ae2e692354da59fe0700a0a38096c6b logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-3340) Add support for YARN Shared Cache

2016-07-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375802#comment-15375802
 ] 

Siddharth Seth commented on TEZ-3340:
-

Tez does not upload the jars on it's own.
However, this is something that we want to change (not sure if a jira exists 
yet). Having to manually upload the jar to HDFS is an avoidable step; it does 
get rid of multiple copies of the same jar all over the dist cache, but there's 
other approaches to avoiding that.

> Add support for YARN Shared Cache
> -
>
> Key: TEZ-3340
> URL: https://issues.apache.org/jira/browse/TEZ-3340
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>
> YARN provides shared cache in functionality YARN-1492. According to 
> [~ctrezzo] most of the YARN functionality is in hadoop 2.8 and frameworks can 
> start to use it. MR adds the support via MAPREDUCE-5951.
> Can anyone confirm if Tez supports the upload of application DAG jar and 
> dependent lib jars from client machine to HDFS as part of Tez app submission? 
> From my test, that doesn't seem to happen. Instead Tez expects applications 
> to upload the jars to HDFS beforehand and then set the tez.aux.uris to the 
> HDFS locations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-3235) Modify Example TestOrderedWordCount job to test the IPC limit for large dag plans

2016-07-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375780#comment-15375780
 ] 

Hitesh Shah edited comment on TEZ-3235 at 7/13/16 9:30 PM:
---

Thanks for the contribution [~ssreenivasan] and for the reviews [~aplusplus]. 
Committed to master  branch 0.8. 


was (Author: hitesh):
Thanks for the contribution [~ssreenivasan] and for the reviews [~aplusplus]. 
Committed to master. 

> Modify Example TestOrderedWordCount job to test the IPC limit for large dag 
> plans
> -
>
> Key: TEZ-3235
> URL: https://issues.apache.org/jira/browse/TEZ-3235
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sushmitha Sreenivasan
>Assignee: Sushmitha Sreenivasan
> Fix For: 0.9.0, 0.8.5
>
> Attachments: TEZ-3235.1.patch, Tez-3235.2.patch, Tez-3235.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3235) Modify Example TestOrderedWordCount job to test the IPC limit for large dag plans

2016-07-13 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3235:
-
Fix Version/s: 0.8.5
   0.9.0

> Modify Example TestOrderedWordCount job to test the IPC limit for large dag 
> plans
> -
>
> Key: TEZ-3235
> URL: https://issues.apache.org/jira/browse/TEZ-3235
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sushmitha Sreenivasan
>Assignee: Sushmitha Sreenivasan
> Fix For: 0.9.0, 0.8.5
>
> Attachments: TEZ-3235.1.patch, Tez-3235.2.patch, Tez-3235.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-3330:

Attachment: TEZ-3330.temp.patch

Patch is not tested at all.

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: TEZ-3330.temp.patch
>
>
> I tried running the simple avro M/R job MapredColorCount, that I found in the 
> examples of avro release 1.7.7.
> It failed with the following trace:
> {code}
> errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  Error while doing final merge
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
> ... 6 more
> {code}
> Digging a bit I saw that during shuffle Tez can't access some of the 
> configuration properties of the job. In our example it is the 
> avro.output.schema that is missing.
> With some more complicated code I could get one step further and a similar 
> issue happened when the valuesIterator for the reducer was being built:
> {code}
> java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
> at 
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
> at 
> org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80)
> at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287)
> {code}
> I am using HDP2.4, Tez 0.7.0, avro 1.7.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375782#comment-15375782
 ] 

Siddharth Seth commented on TEZ-3330:
-

I don't think there's any way to do this at the moment. Attaching a temporary 
patch for this. Don't think fixing this properly is trivial; well we could just 
skip the ConfigBuilders altogether.

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: TEZ-3330.temp.patch
>
>
> I tried running the simple avro M/R job MapredColorCount, that I found in the 
> examples of avro release 1.7.7.
> It failed with the following trace:
> {code}
> errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  Error while doing final merge
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
> ... 6 more
> {code}
> Digging a bit I saw that during shuffle Tez can't access some of the 
> configuration properties of the job. In our example it is the 
> avro.output.schema that is missing.
> With some more complicated code I could get one step further and a similar 
> issue happened when the valuesIterator for the reducer was being built:
> {code}
> java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
> at 
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
> at 
> org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80)
> at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287)
> {code}
> I am using HDP2.4, Tez 0.7.0, avro 1.7.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3235) Modify Example TestOrderedWordCount job to test the IPC limit for large dag plans

2016-07-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375778#comment-15375778
 ] 

Hitesh Shah commented on TEZ-3235:
--

+1 for now. I think TestOrderedWordCount needs a large refactor to clean up the 
current codebase. Committing shortly. 

> Modify Example TestOrderedWordCount job to test the IPC limit for large dag 
> plans
> -
>
> Key: TEZ-3235
> URL: https://issues.apache.org/jira/browse/TEZ-3235
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sushmitha Sreenivasan
>Assignee: Sushmitha Sreenivasan
> Attachments: TEZ-3235.1.patch, Tez-3235.2.patch, Tez-3235.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3235) Modify Example TestOrderedWordCount job to test the IPC limit for large dag plans

2016-07-13 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3235:
-
Affects Version/s: (was: 0.8.3)

> Modify Example TestOrderedWordCount job to test the IPC limit for large dag 
> plans
> -
>
> Key: TEZ-3235
> URL: https://issues.apache.org/jira/browse/TEZ-3235
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Sushmitha Sreenivasan
>Assignee: Sushmitha Sreenivasan
> Attachments: TEZ-3235.1.patch, Tez-3235.2.patch, Tez-3235.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3331) Add operation specific HDFS counters for Tez UI

2016-07-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-3331:

Assignee: Hitesh Shah

> Add operation specific HDFS counters for Tez UI
> ---
>
> Key: TEZ-3331
> URL: https://issues.apache.org/jira/browse/TEZ-3331
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Hitesh Shah
> Attachments: TEZ-3331.wip.2.patch, TEZ-3331.wip.3.patch, 
> TEZ-3331.wip.4.patch, TEZ-3331.wip.5.patch, TEZ-3331.wip.patch
>
>
> Hadoop has added several operation specific counters in the FileSystem 
> statistics (HADOOP-13065). These counters are useful to track file system 
> operations more granularly. It would be great to track these counters for Tez 
> and expose them via UI as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3336) Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE

2016-07-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375691#comment-15375691
 ] 

Siddharth Seth commented on TEZ-3336:
-

[~jlowe] - InputInitializer events and VMEvents short circuit the DAG 
definition to some extent. Pretty much any task can send these events to any 
Vertex since there's no Control Plane definition to restrict this.
The Hive processor is supposed to target these events at specific Vertices 
which know how to handle them. In fact I'm not sure Hive uses 
MRInputSplitGeneartor at all anymore. It has it's own SplitGenerator which is 
based on MRInputSplitGeneartor - and knows how to handle these events for 
partition pruning. This sounds like a Hive bug to me.

> Hive map-side join job sometimes fails with ROOT_INPUT_INIT_FAILURE
> ---
>
> Key: TEZ-3336
> URL: https://issues.apache.org/jira/browse/TEZ-3336
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1
>Reporter: Jason Lowe
>
> When Hive does a map-side join it can generate a DAG where a vertex has two 
> inputs, one from an upstream task and another using MRInputAMSplitGenerator.  
> If it takes a while for MRInputAMSplitGenerator to compute the splits and one 
> of the tasks for the other upstream vertex completes then the job can fail 
> with an error since MRInputAMSplitGenerator does not expect to receive any 
> events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3274) Vertex with MRInput and shuffle input does not respect slow start

2016-07-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375682#comment-15375682
 ] 

Siddharth Seth commented on TEZ-3274:
-

Haven't looked at this in sometime. Is this being used with 
MRInputSplitDistributor, and the initial parallelism set on the specific 
vertex. I don't think using a Root Input along with a ShuffleInput on the same 
vertex will work with MRInputAMSplitGenerator since parallelism is setup at 
runtime. Shuffle tasks will see a value of -1 if the initialization takes time.

I believe we never really focused on this case, and if it showed up - it would 
need to be handled via a custom VertexManager. If such a manager were to exist 
- how would the data distribution be handled? There's different splits for the 
MRInput and partitions on the Shuffle side - how are they mapped?


> Vertex with MRInput and shuffle input does not respect slow start
> -
>
> Key: TEZ-3274
> URL: https://issues.apache.org/jira/browse/TEZ-3274
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>
> Vertices with shuffle input and MRInput choose RootInputVertexManager (and 
> not ShuffleVertexManager) and start containers and tasks immediately. In this 
> scenario, resources can be wasted since they do not respect 
> tez.shuffle-vertex-manager.min-src-fraction 
> tez.shuffle-vertex-manager.max-src-fraction. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3337) Do not log empty fields of TaskAttemptFinishedEvent to avoid confusion

2016-07-13 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3337:
-
Summary: Do not log empty fields of TaskAttemptFinishedEvent to avoid 
confusion  (was: Not log empty fields of TaskAttemptFinishedEvent to avoid 
confusion)

> Do not log empty fields of TaskAttemptFinishedEvent to avoid confusion
> --
>
> Key: TEZ-3337
> URL: https://issues.apache.org/jira/browse/TEZ-3337
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3337.1.patch
>
>
> For successful task attempt, we don't record the containerId, which cause 
> "containerId=," in the INFO logs. We should avoid logging this field if it's 
> empty.
> {code}
> 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, 
> creationTime=1467956979891, allocationTime=1467956980426, 
> startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, 
> creationTime=1467956979894, allocationTime=1467956980427, 
> startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3337) Not log empty fields of TaskAttemptFinishedEvent to avoid confusion

2016-07-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375429#comment-15375429
 ] 

Hitesh Shah commented on TEZ-3337:
--

+1. Committing shortly. 

> Not log empty fields of TaskAttemptFinishedEvent to avoid confusion
> ---
>
> Key: TEZ-3337
> URL: https://issues.apache.org/jira/browse/TEZ-3337
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3337.1.patch
>
>
> For successful task attempt, we don't record the containerId, which cause 
> "containerId=," in the INFO logs. We should avoid logging this field if it's 
> empty.
> {code}
> 2016-07-07 22:49:44,935 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1467855578616_0044_1_07_00_0, 
> creationTime=1467956979891, allocationTime=1467956980426, 
> startTime=1467956982433, finishTime=1467956984933, timeTaken=2500, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> 2016-07-07 22:49:44,937 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1467855578616_0044_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 11, taskAttemptId=attempt_1467855578616_0044_1_02_00_0, 
> creationTime=1467956979894, allocationTime=1467956980427, 
> startTime=1467956982437, finishTime=1467956984936, timeTaken=2499, 
> status=SUCCEEDED, errorEnum=, diagnostics=, containerId=, nodeId=, 
> nodeHttpAddress=
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3331) Add operation specific HDFS counters for Tez UI

2016-07-13 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3331:
-
Attachment: TEZ-3331.wip.5.patch

Fix typo in test file.

> Add operation specific HDFS counters for Tez UI
> ---
>
> Key: TEZ-3331
> URL: https://issues.apache.org/jira/browse/TEZ-3331
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
> Attachments: TEZ-3331.wip.2.patch, TEZ-3331.wip.3.patch, 
> TEZ-3331.wip.4.patch, TEZ-3331.wip.5.patch, TEZ-3331.wip.patch
>
>
> Hadoop has added several operation specific counters in the FileSystem 
> statistics (HADOOP-13065). These counters are useful to track file system 
> operations more granularly. It would be great to track these counters for Tez 
> and expose them via UI as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3344) ATS does not contain certain vertex details

2016-07-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375316#comment-15375316
 ] 

Hitesh Shah commented on TEZ-3344:
--

What version of hadoop are you using? Also, how has timeline been configured? 
v1? v1.5? 

Do you have the yarn app logs for this?  

> ATS does not contain certain vertex details
> ---
>
> Key: TEZ-3344
> URL: https://issues.apache.org/jira/browse/TEZ-3344
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: all_vertices_page_showing_only_3_vertices.png, dag.json, 
> dag_details.png, dag_succeeded_screenshot.png, vertex.json
>
>
> e.g hive query
> {noformat}
> select count(1) from (select d_date, t_time_id from (select d_date from 
> date_dim sort by d_date) d, time_dim) x
> {noformat}
> This has 4 vertices (will attach the DAG shortly).  However, when ats data is 
> downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" 
> data is missing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3347) Vertex UI throws an error while getting vertexProgress for a killed Vertex

2016-07-13 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3347:
-
Attachment: ErrorCodeFailedVertex.png

> Vertex UI throws an error while getting vertexProgress for a killed Vertex
> --
>
> Key: TEZ-3347
> URL: https://issues.apache.org/jira/browse/TEZ-3347
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Kuhu Shukla
> Attachments: ErrorCodeFailedVertex.png
>
>
> Given an AM that fails all its attempts, the application fails and the very 
> first click on the killed/failed vertex throws the following error:
> {code}
>  error code: Unknown, message: expected expression, got '<'
> {code}
> It self corrects if tried again immediately after the failure.
> This is because the RM proxy redirects the call to the AHS server and the 
> REST call is malformed for that server. Upon inspection of the responses, it 
> was seen that the URL looked something like this:
> {code}
> http://:/applicationhistory/app/application_123_456/ws/v1/tez/vertexProgress?dagID=1=01&_=123
> {code}
> which is not a proper Rest call on the AHS.
> I think the following code can cause this issue:
> {code}
> // Load progress in parallel for v1 version of the api
>   _loadProgress: function (vertices) {
> var that = this,
> runningVerticesIdx = vertices
>   .filterBy('status', 'RUNNING')
>   .map(function(item) {
> return item.get('id').split('_').splice(-1).pop();
>   });
> if (runningVerticesIdx.length > 0) {
>   this.store.unloadAll('vertexProgress');
>   this.store.findQuery('vertexProgress', {
> metadata: {
>   appId: that.get('applicationId'),
>   dagIdx: that.get('idx'),
>   vertexIds: runningVerticesIdx.join(',')
> }
>   }).then(function(vertexProgressInfo) {
>   App.Helpers.emData.mergeRecords(
> that.get('rowsDisplayed'),
> vertexProgressInfo,
> ['progress']
>   );
>   }).catch(function(error) {
> error.message = "Failed to fetch vertexProgress. Application Master 
> (AM) is out of reach. Either it's down, or CORS is not enabled for YARN 
> ResourceManager.";
> Em.Logger.error(error);
> var err = App.Helpers.misc.formatError(error);
> var msg = 'Error code: %@, message: %@'.fmt(err.errCode, err.msg);
> App.Helpers.ErrorBar.getInstance().show(msg, err.details);
>   });
> {code}
> which uses AMInfo that gets the response based on what loadApp method finds:
> {code}
> loadApp: function (store, appId, useCache) {
> if(!useCache) {
>   App.Helpers.misc.removeRecord(store, 'appDetail', appId);
>   App.Helpers.misc.removeRecord(store, 'clusterApp', appId);
> }
> return store.find('clusterApp', appId).catch(function () {
>   return store.find('appDetail', appId);
> }).catch(function (error) {
>   error.message = "Couldn't get details of application %@. RM is not 
> reachable, and history service is not enabled.".fmt(appId);
>   throw error;
> });
>   }
> {code}
> We can check here in the catch block if the response type is not JSON  or not 
> try and get vertexProgress since it knows that the application/AM has failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3347) Vertex UI throws an error while getting vertexProgress for a killed Vertex

2016-07-13 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-3347:


 Summary: Vertex UI throws an error while getting vertexProgress 
for a killed Vertex
 Key: TEZ-3347
 URL: https://issues.apache.org/jira/browse/TEZ-3347
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Reporter: Kuhu Shukla
 Attachments: ErrorCodeFailedVertex.png

Given an AM that fails all its attempts, the application fails and the very 
first click on the killed/failed vertex throws the following error:
{code}
 error code: Unknown, message: expected expression, got '<'
{code}
It self corrects if tried again immediately after the failure.

This is because the RM proxy redirects the call to the AHS server and the REST 
call is malformed for that server. Upon inspection of the responses, it was 
seen that the URL looked something like this:
{code}
http://:/applicationhistory/app/application_123_456/ws/v1/tez/vertexProgress?dagID=1=01&_=123
{code}
which is not a proper Rest call on the AHS.

I think the following code can cause this issue:
{code}
// Load progress in parallel for v1 version of the api
  _loadProgress: function (vertices) {
var that = this,
runningVerticesIdx = vertices
  .filterBy('status', 'RUNNING')
  .map(function(item) {
return item.get('id').split('_').splice(-1).pop();
  });

if (runningVerticesIdx.length > 0) {
  this.store.unloadAll('vertexProgress');
  this.store.findQuery('vertexProgress', {
metadata: {
  appId: that.get('applicationId'),
  dagIdx: that.get('idx'),
  vertexIds: runningVerticesIdx.join(',')
}
  }).then(function(vertexProgressInfo) {
  App.Helpers.emData.mergeRecords(
that.get('rowsDisplayed'),
vertexProgressInfo,
['progress']
  );
  }).catch(function(error) {
error.message = "Failed to fetch vertexProgress. Application Master 
(AM) is out of reach. Either it's down, or CORS is not enabled for YARN 
ResourceManager.";
Em.Logger.error(error);
var err = App.Helpers.misc.formatError(error);
var msg = 'Error code: %@, message: %@'.fmt(err.errCode, err.msg);
App.Helpers.ErrorBar.getInstance().show(msg, err.details);
  });
{code}
which uses AMInfo that gets the response based on what loadApp method finds:
{code}
loadApp: function (store, appId, useCache) {
if(!useCache) {
  App.Helpers.misc.removeRecord(store, 'appDetail', appId);
  App.Helpers.misc.removeRecord(store, 'clusterApp', appId);
}

return store.find('clusterApp', appId).catch(function () {
  return store.find('appDetail', appId);
}).catch(function (error) {
  error.message = "Couldn't get details of application %@. RM is not 
reachable, and history service is not enabled.".fmt(appId);
  throw error;
});
  }
{code}

We can check here in the catch block if the response type is not JSON  or not 
try and get vertexProgress since it knows that the application/AM has failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3346) Vertex status on DAG UI and Vertex UI can be inconsistent

2016-07-13 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3346:
-
Attachment: VertexPage.png
DagPage_AMDiagnostics.png

> Vertex status on DAG UI and Vertex UI can be inconsistent
> -
>
> Key: TEZ-3346
> URL: https://issues.apache.org/jira/browse/TEZ-3346
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Kuhu Shukla
> Attachments: DagPage_AMDiagnostics.png, VertexPage.png
>
>
> If the AM fails on all its attempts, the Vertex status on the DAG page is 
> "KILLED" . On clicking the vertexName, the vertex UI page shows the status as 
> "FAILED" (which it pulls from the proxy).
> I think this code in {{vertex_tasks_controller.js}} makes the decision to 
> mark the vertex as KILLED, since the Timeline Server has "RUNNING" as the 
> last updated vertex status since the AM crashed.
> {code}
>  if (taskStatus == 'RUNNING' && isUnsuccessfulVertex) {
> taskStatus = 'KILLED'
>   }
>   if (taskStatus != task.get('status')) {
> task.set('status', taskStatus);
>   }
> });
> {code}
> Not sure which of the 2 values is semantically correct. The FAILED state 
> seems to come from this logic which uses appState and finalState of the 
> application from Timeline's and AHS's perspective to decide the vertex state.
> {code}
> vertex.set('status', App.Helpers.misc.getRealStatus(vertex.get('status'), 
> appDetail.get('status'),
> appDetail.get('finalStatus')));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3346) Vertex status on DAG UI and Vertex UI can be inconsistent

2016-07-13 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-3346:


 Summary: Vertex status on DAG UI and Vertex UI can be inconsistent
 Key: TEZ-3346
 URL: https://issues.apache.org/jira/browse/TEZ-3346
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Reporter: Kuhu Shukla


If the AM fails on all its attempts, the Vertex status on the DAG page is 
"KILLED" . On clicking the vertexName, the vertex UI page shows the status as 
"FAILED" (which it pulls from the proxy).

I think this code in {{vertex_tasks_controller.js}} makes the decision to mark 
the vertex as KILLED, since the Timeline Server has "RUNNING" as the last 
updated vertex status since the AM crashed.
{code}
 if (taskStatus == 'RUNNING' && isUnsuccessfulVertex) {
taskStatus = 'KILLED'
  }
  if (taskStatus != task.get('status')) {
task.set('status', taskStatus);
  }
});
{code}

Not sure which of the 2 values is semantically correct. The FAILED state seems 
to come from this logic which uses appState and finalState of the application 
from Timeline's and AHS's perspective to decide the vertex state.

{code}
vertex.set('status', App.Helpers.misc.getRealStatus(vertex.get('status'), 
appDetail.get('status'),
appDetail.get('finalStatus')));
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3345) Diagnostics for a failed AM may not show up on the DAG UI page.

2016-07-13 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3345:
-
Attachment: DagPage_AMDiagnostics.png

> Diagnostics for a failed AM may not show up on the DAG UI page.
> ---
>
> Key: TEZ-3345
> URL: https://issues.apache.org/jira/browse/TEZ-3345
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Kuhu Shukla
> Attachments: DagPage_AMDiagnostics.png
>
>
> In a scenario where AM fails on all its attempts, the DAG page does not show 
> the Diagnostics message on the Tez UI "Diagnostics" section. The messageis 
> available on the AHS and could be pulled into the DAG page just like we do 
> for failed vertices.
> I ran a simple Tez example job that was given way too less memory causing the 
> AM to fail:
> {code}
> hadoop jar /home/gs/tez/current/tez-examples-0.7.1.x.x.jar orderedwordcount 
> -Dtez.am.resource.memory.mb=16 -Dtez.am.launch.cmd-opts="-Xmx13m 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -Xloggc:/gc.log"/books /output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3345) Diagnostics for a failed AM may not show up on the DAG UI page.

2016-07-13 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created TEZ-3345:


 Summary: Diagnostics for a failed AM may not show up on the DAG UI 
page.
 Key: TEZ-3345
 URL: https://issues.apache.org/jira/browse/TEZ-3345
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Reporter: Kuhu Shukla


In a scenario where AM fails on all its attempts, the DAG page does not show 
the Diagnostics message on the Tez UI "Diagnostics" section. The messageis 
available on the AHS and could be pulled into the DAG page just like we do for 
failed vertices.

I ran a simple Tez example job that was given way too less memory causing the 
AM to fail:
{code}
hadoop jar /home/gs/tez/current/tez-examples-0.7.1.x.x.jar orderedwordcount 
-Dtez.am.resource.memory.mb=16 -Dtez.am.launch.cmd-opts="-Xmx13m 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails 
-XX:+PrintGCDateStamps -Xloggc:/gc.log"/books /output
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3344) ATS does not contain certain vertex details

2016-07-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-3344:
--
Attachment: dag_succeeded_screenshot.png

> ATS does not contain certain vertex details
> ---
>
> Key: TEZ-3344
> URL: https://issues.apache.org/jira/browse/TEZ-3344
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: all_vertices_page_showing_only_3_vertices.png, dag.json, 
> dag_details.png, dag_succeeded_screenshot.png, vertex.json
>
>
> e.g hive query
> {noformat}
> select count(1) from (select d_date, t_time_id from (select d_date from 
> date_dim sort by d_date) d, time_dim) x
> {noformat}
> This has 4 vertices (will attach the DAG shortly).  However, when ats data is 
> downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" 
> data is missing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3344) ATS does not contain certain vertex details

2016-07-13 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374836#comment-15374836
 ] 

Rajesh Balamohan commented on TEZ-3344:
---

for the missing vertex, data is not in ATS.

http://ats_machine/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1466700718395_0351_4_03

{noformat}
{
exception: "NotFoundException",
message: "java.lang.Exception: Timeline entity { id: 
vertex_1466700718395_0351_4_03, type: TEZ_VERTEX_ID } is not found",
javaClassName: "org.apache.hadoop.yarn.webapp.NotFoundException"
}
{noformat}

> ATS does not contain certain vertex details
> ---
>
> Key: TEZ-3344
> URL: https://issues.apache.org/jira/browse/TEZ-3344
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: all_vertices_page_showing_only_3_vertices.png, dag.json, 
> dag_details.png, vertex.json
>
>
> e.g hive query
> {noformat}
> select count(1) from (select d_date, t_time_id from (select d_date from 
> date_dim sort by d_date) d, time_dim) x
> {noformat}
> This has 4 vertices (will attach the DAG shortly).  However, when ats data is 
> downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" 
> data is missing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3344) ATS does not contain certain vertex details

2016-07-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-3344:
--
Attachment: dag.json
vertex.json

\cc [~Sreenath]

> ATS does not contain certain vertex details
> ---
>
> Key: TEZ-3344
> URL: https://issues.apache.org/jira/browse/TEZ-3344
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: all_vertices_page_showing_only_3_vertices.png, dag.json, 
> dag_details.png, vertex.json
>
>
> e.g hive query
> {noformat}
> select count(1) from (select d_date, t_time_id from (select d_date from 
> date_dim sort by d_date) d, time_dim) x
> {noformat}
> This has 4 vertices (will attach the DAG shortly).  However, when ats data is 
> downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" 
> data is missing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3344) ATS does not contain certain vertex details

2016-07-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-3344:
--
Attachment: all_vertices_page_showing_only_3_vertices.png

> ATS does not contain certain vertex details
> ---
>
> Key: TEZ-3344
> URL: https://issues.apache.org/jira/browse/TEZ-3344
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: all_vertices_page_showing_only_3_vertices.png, 
> dag_details.png
>
>
> e.g hive query
> {noformat}
> select count(1) from (select d_date, t_time_id from (select d_date from 
> date_dim sort by d_date) d, time_dim) x
> {noformat}
> This has 4 vertices (will attach the DAG shortly).  However, when ats data is 
> downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" 
> data is missing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3344) ATS does not contain certain vertex details

2016-07-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-3344:
--
Attachment: dag_details.png

> ATS does not contain certain vertex details
> ---
>
> Key: TEZ-3344
> URL: https://issues.apache.org/jira/browse/TEZ-3344
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: dag_details.png
>
>
> e.g hive query
> {noformat}
> select count(1) from (select d_date, t_time_id from (select d_date from 
> date_dim sort by d_date) d, time_dim) x
> {noformat}
> This has 4 vertices (will attach the DAG shortly).  However, when ats data is 
> downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" 
> data is missing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3344) ATS does not contain certain vertex details

2016-07-13 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-3344:
-

 Summary: ATS does not contain certain vertex details
 Key: TEZ-3344
 URL: https://issues.apache.org/jira/browse/TEZ-3344
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan


e.g hive query
{noformat}
select count(1) from (select d_date, t_time_id from (select d_date from 
date_dim sort by d_date) d, time_dim) x
{noformat}

This has 4 vertices (will attach the DAG shortly).  However, when ats data is 
downloaded via tez-ui or via ATSImporTool, "vertex_1466700718395_0351_4_03" 
data is missing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)