[jira] [Updated] (TEZ-3357) Change TimecachePlugin to return grouped entity ids.

2016-07-19 Thread Harish Jaiprakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Jaiprakash updated TEZ-3357:
---
Attachment: (was: TEZ-3357.02.patch)

> Change TimecachePlugin to return grouped entity ids.
> 
>
> Key: TEZ-3357
> URL: https://issues.apache.org/jira/browse/TEZ-3357
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
> Attachments: TEZ-3357.01.patch, TEZ-3357.03.patch
>
>
> TimelineCachePlugin has to return TimelineEntityGroupId grouped based on dag 
> id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3357) Change TimecachePlugin to return grouped entity ids.

2016-07-19 Thread Harish Jaiprakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Jaiprakash updated TEZ-3357:
---
Attachment: TEZ-3357.03.patch

Removing changes in pom.xml

> Change TimecachePlugin to return grouped entity ids.
> 
>
> Key: TEZ-3357
> URL: https://issues.apache.org/jira/browse/TEZ-3357
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
> Attachments: TEZ-3357.01.patch, TEZ-3357.02.patch, TEZ-3357.03.patch
>
>
> TimelineCachePlugin has to return TimelineEntityGroupId grouped based on dag 
> id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3357) Change TimecachePlugin to return grouped entity ids.

2016-07-19 Thread Harish Jaiprakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Jaiprakash updated TEZ-3357:
---
Attachment: TEZ-3357.02.patch

Addressing all the code review comments from Hitesh. [~hitesh] Please take a 
look.

> Change TimecachePlugin to return grouped entity ids.
> 
>
> Key: TEZ-3357
> URL: https://issues.apache.org/jira/browse/TEZ-3357
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
> Attachments: TEZ-3357.01.patch, TEZ-3357.02.patch
>
>
> TimelineCachePlugin has to return TimelineEntityGroupId grouped based on dag 
> id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-3357) Change TimecachePlugin to return grouped entity ids.

2016-07-19 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384796#comment-15384796
 ] 

Hitesh Shah edited comment on TEZ-3357 at 7/19/16 11:56 PM:


Comments: 
  - TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP - usedNumDagsPerGroup - configs 
do not use camel case. Either "." or "-". 
  -  ConfigurationScope(Scope.AM) - does not make sense here at all. 
  - Why is TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT needed? Can't it 
fall back to the configured value used when writing/creating these groups?
  - Changes in TezUtilsInternal: In retrospect, the grouping logic can belong 
in TezDAGId.java. A couple of points though:
- use just daggroup or groupId and do not reference timeline in docs or var 
names
- s/DAGGROUP/daggroup/
- s/AtsGroup/DagGroup/
 
  - For "TimelineEntityGroupPlugin implements Configurable" - how are we 
certain that YARN will invoke setConf on the plugin? The base class 
TimelineEntityGroupPlugin does not require a plugin to implement methods 
defined as part of the Configurable interface. Additionally, the config would 
be set in tez-site.xml - who would load this config file? 
 - What if setConf is never called? 

{code}
conf.get(TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP,
184 
TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT).split(",");
{code}
   - maybe use conf.getStrings() ? 

{code}
   * Comma separated list of numDagsPerGroup used until now. This is used by 
the TimelineCacheService to generate
1233   * TimelineEntityGroupIds. Do not add too many here, it will affect 
the performance of ATS during query time.
{code}
   - Needs a bit more clarity on what this config implies or maybe a reference 
to the write path where this will be used? 

TestTimelineCachePluginImpl needs tests for where 
TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP is configured to null/empty i.e. 
 or more than one value i.e. "50,100".

{code}
for (int i = 0; i < usedNumGroups.length; ++i) {
187   allNumGroupsPerDag[i] = Integer.parseInt(usedNumGroups[i]);
188 }
{code}
   - what if someone sets an invalid value? The stack trace without an 
accompaning message of which config property would make it tough for a user to 
figure out what is wrong. 






was (Author: hitesh):
Comments: 
  - TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP - usedNumDagsPerGroup - configs 
do not use camel case. Either "." or "-". 
  -  ConfigurationScope(Scope.AM) - does not make sense here at all. 
  - Why is TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT needed? Can't it 
fall back to the configured value used when writing/creating these groups?
  - Changes in TezUtilsInternal: In retrospect, the grouping logic can belong 
in TezDAGId.java. A couple of points though:
- use just daggroup or groupId and do not reference timeline in docs or var 
names
- s/DAGGROUP/daggroup/
- s/AtsGroup/DagGroup/
 
  - For "TimelineEntityGroupPlugin implements Configurable" - how are we 
certain that YARN will invoke setConf on the plugin? The base class 
TimelineEntityGroupPlugin does not require a plugin to implement methods 
defined as part of the Configurable interface. Additionally, the config would 
be set in tez-site.xml - who would load this config file? 
 - What if setConf is never called? 

{code}
conf.get(TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP,
184 
TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT).split(",");
{code}
   - maybe use conf.getStrings() ? 

{code}
   * Comma separated list of numDagsPerGroup used until now. This is used by 
the TimelineCacheService to generate
1233   * TimelineEntityGroupIds. Do not add too many here, it will affect 
the performance of ATS during query time.
{code}
   - Needs a bit more clarity on what this config implies or maybe a reference 
to the write path where this will be used? 

TestTimelineCachePluginImpl needs tests for where 
TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT is configured to empty i.e. 
 or more than one value i.e. "50,100".

{code}
for (int i = 0; i < usedNumGroups.length; ++i) {
187   allNumGroupsPerDag[i] = Integer.parseInt(usedNumGroups[i]);
188 }
{code}
   - what if someone sets an invalid value? The stack trace without an 
accompaning message of which config property would make it tough for a user to 
figure out what is wrong. 





> Change TimecachePlugin to return grouped entity ids.
> 
>
> Key: TEZ-3357
> URL: https://issues.apache.org/jira/browse/TEZ-3357
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
> Attachments: TEZ-3357.01.patch
>
>
> TimelineCachePlugin has to 

[jira] [Commented] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-19 Thread Piyush Narang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385046#comment-15385046
 ] 

Piyush Narang commented on TEZ-3348:


Thanks for the help on this [~hitesh] :-)

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Fix For: 0.9.0, 0.8.5
>
> Attachments: 11.patch, 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3338) Support classloader isolation

2016-07-19 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384950#comment-15384950
 ] 

Siddharth Seth commented on TEZ-3338:
-

There's multiple models by which tez and hadoop dependencies can be deployed. 
Don't think we should try tackling all of them for classpath isolation.
tez.lib.uris is used to provide tez libraries - typically along with required 
hadoop libraries. (tez pulls in the required hadoop libs into the tar). This 
goes into $PWD/tezlib.
tez.aux.uris - can typically by used to add additional resources to the 
classpath (localized by the client). I suspect that there are times when these 
go into $PWD/tezlib as well.
In addition users can specify resources when creating a TezClient, or when 
submitting a DAG. These get localized into $PWD.

The classpath, this is setup to be $PWD/*:$PWD/tezlib/*

Sections which run user code: this would be the various plugin points (and also 
places where Tez may explicitly change the UGI) - Input/Processor/Output 
create/init/run on the runtime side. InputInitializers, VertexManagers, 
EdgeManagers, OutputCommitters on the AM side.

> Support classloader isolation
> -
>
> Key: TEZ-3338
> URL: https://issues.apache.org/jira/browse/TEZ-3338
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>
> HADOOP-10893 and MAPREDUCE-1700 provide classloader isolation at both client 
> side and container side for MR. We should add the same support for Tez. Given 
> we use hadoop command to launch Tez, it appears the client side has been 
> taken care of. Only the container side support is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3348) NullPointerException in Tez MROutput while trying to write using Parquet's DeprecatedParquetOutputFormat

2016-07-19 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384918#comment-15384918
 ] 

Hitesh Shah commented on TEZ-3348:
--

+1. Committing shortly. 

> NullPointerException in Tez MROutput while trying to write using Parquet's 
> DeprecatedParquetOutputFormat
> 
>
> Key: TEZ-3348
> URL: https://issues.apache.org/jira/browse/TEZ-3348
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Piyush Narang
>Assignee: Piyush Narang
> Attachments: 11.patch, 11.patch
>
>
> Trying to run some Tez MR jobs that write out some data using Parquet to 
> HDFS. When I try to do so, end up seeing a NPE in the Parquet code:
> {code}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.access$100(DeprecatedParquetOutputFormat.java:36)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat$RecordWriterWrapper.(DeprecatedParquetOutputFormat.java:89)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:77)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:416)
> {code}
> The flow seems to be:
> 1) The Parquet deprecated output format class tries to read the 
> workOutputPath - 
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetOutputFormat.java#L69
> 2) This calls FileOutputFormat.getWorkOutputPath(...) - 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputFormat.java#L229
> 3) That in turn tries to read the JobContext.TASK_OUTPUT_DIR 
> ("mapreduce.task.output.dir") constant. 
> 4) This ends up being null and in the Parquet code we end up with an NPE in 
> the Path class. 
> Looking at the Tez code, we are setting the workOutputPath in the 
> MROutput.initCommitter method - 
> https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java#L445.
>  
> This call however, is made after the call to access the workOutputPath as 
> part of outputFormat.getRecordWriter(). 
> I tried out a run where I moved this initCommitter call up:
> {code}
> else {
>   oldApiTaskAttemptContext =
>   new org.apache.tez.mapreduce.hadoop.mapred.TaskAttemptContextImpl(
>   jobConf, taskAttemptId,
>   new MRTaskReporter(getContext()));
>   initCommitter(jobConf, useNewApi); // before the getRecordWriter call
>   oldOutputFormat = jobConf.getOutputFormat();
>   outputFormatClassName = oldOutputFormat.getClass().getName();
>   FileSystem fs = FileSystem.get(jobConf);
>   String finalName = getOutputName();
>   oldRecordWriter =
>   oldOutputFormat.getRecordWriter(
>   fs, jobConf, finalName, new 
> MRReporter(getContext().getCounters()));
> }
> {code}
> I tried out a run with this and it seems to succeed. If this sounds 
> reasonable, I can cut a PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3356) Fix initializing of stats when custom ShuffleVertexManager is used

2016-07-19 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384890#comment-15384890
 ] 

Hitesh Shah commented on TEZ-3356:
--

Looks like this is ready for commit. Committing shortly. 

> Fix initializing of stats when custom ShuffleVertexManager is used
> --
>
> Key: TEZ-3356
> URL: https://issues.apache.org/jira/browse/TEZ-3356
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.4
>Reporter: Peter Slawski
>Assignee: Peter Slawski
> Attachments: TEZ-3356.1.patch
>
>
> When using a custom ShuffleVertexManager to set a vertex’s parallelism, the 
> partition stats field will be left uninitialized even after the manager 
> itself gets initialized. This results in a IllegalStateException to be thrown 
> as the stats field will not yet be initialized when VertexManagerEvents are 
> processed upon the start of the vertex. Note that these events contain 
> partition sizes which are aggregated and stored in this stats field.
>  
> Apache Pig’s grace auto-parallelism feature uses a custom 
> ShuffleVertexManager which sets a vertex’s parallelism upon the completion of 
> one of its parent’s parents. Thus, this corner case is hit and pig scripts 
> with grace parallelism enabled would fail if the DAG consists of at least one 
> vertex having grandparents.
>  
> The fix should be straight forward. Before rather than after 
> VertexManagerEvents are processed, simply update pending tasks to ensure the 
> partition stats field will be initialized.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-3357) Change TimecachePlugin to return grouped entity ids.

2016-07-19 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384796#comment-15384796
 ] 

Hitesh Shah edited comment on TEZ-3357 at 7/19/16 9:11 PM:
---

Comments: 
  - TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP - usedNumDagsPerGroup - configs 
do not use camel case. Either "." or "-". 
  -  ConfigurationScope(Scope.AM) - does not make sense here at all. 
  - Why is TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT needed? Can't it 
fall back to the configured value used when writing/creating these groups?
  - Changes in TezUtilsInternal: In retrospect, the grouping logic can belong 
in TezDAGId.java. A couple of points though:
- use just daggroup or groupId and do not reference timeline in docs or var 
names
- s/DAGGROUP/daggroup/
- s/AtsGroup/DagGroup/
 
  - For "TimelineEntityGroupPlugin implements Configurable" - how are we 
certain that YARN will invoke setConf on the plugin? The base class 
TimelineEntityGroupPlugin does not require a plugin to implement methods 
defined as part of the Configurable interface. Additionally, the config would 
be set in tez-site.xml - who would load this config file? 
 - What if setConf is never called? 

{code}
conf.get(TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP,
184 
TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT).split(",");
{code}
   - maybe use conf.getStrings() ? 

{code}
   * Comma separated list of numDagsPerGroup used until now. This is used by 
the TimelineCacheService to generate
1233   * TimelineEntityGroupIds. Do not add too many here, it will affect 
the performance of ATS during query time.
{code}
   - Needs a bit more clarity on what this config implies or maybe a reference 
to the write path where this will be used? 

TestTimelineCachePluginImpl needs tests for where 
TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT is configured to empty i.e. 
 or more than one value i.e. "50,100".

{code}
for (int i = 0; i < usedNumGroups.length; ++i) {
187   allNumGroupsPerDag[i] = Integer.parseInt(usedNumGroups[i]);
188 }
{code}
   - what if someone sets an invalid value? The stack trace without an 
accompaning message of which config property would make it tough for a user to 
figure out what is wrong. 






was (Author: hitesh):
Comments: 
  - TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP - usedNumDagsPerGroup - configs 
do not use camel case. Either "." or "-". 
  -  ConfigurationScope(Scope.AM) - does not make sense here at all. 
  - Why is TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT needed? Can't it 
fall back to the configured value used when writing/creating these groups?
  - Changes in TezUtilsInternal: In retrospect, the grouping logic can belong 
in TezDAGId.java. A couple of points though:
- use just daggroup or groupId and do not reference timeline in docs or var 
names
- s/DAGGROUP/daggroup/
- s/AtsGroup/DagGroup/
 
  - For "TimelineEntityGroupPlugin implements Configurable" - how are we 
certain that YARN will invoke setConf on the plugin? The base class 
TimelineEntityGroupPlugin does not require a plugin to implement methods 
defined as part of the Configurable interface. 
 - What if setConf is never called? 

{code}
conf.get(TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP,
184 
TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT).split(",");
{code}
   - maybe use conf.getStrings() ? 

{code}
   * Comma separated list of numDagsPerGroup used until now. This is used by 
the TimelineCacheService to generate
1233   * TimelineEntityGroupIds. Do not add too many here, it will affect 
the performance of ATS during query time.
{code}
   - Needs a bit more clarity on what this config implies or maybe a reference 
to the write path where this will be used? 

TestTimelineCachePluginImpl needs tests for where 
TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT is configured to empty i.e. 
 or more than one value i.e. "50,100".

{code}
for (int i = 0; i < usedNumGroups.length; ++i) {
187   allNumGroupsPerDag[i] = Integer.parseInt(usedNumGroups[i]);
188 }
{code}
   - what if someone sets an invalid value? The stack trace without an 
accompaning message of which config property would make it tough for a user to 
figure out what is wrong. 





> Change TimecachePlugin to return grouped entity ids.
> 
>
> Key: TEZ-3357
> URL: https://issues.apache.org/jira/browse/TEZ-3357
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
> Attachments: TEZ-3357.01.patch
>
>
> TimelineCachePlugin has to return TimelineEntityGroupId grouped based on dag 
> id.



--
This message was sent by 

[jira] [Commented] (TEZ-3338) Support classloader isolation

2016-07-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384803#comment-15384803
 ] 

Sangjin Lee commented on TEZ-3338:
--

If we are going to have isolation between (tez + hadoop + their dependencies) 
and (user code + its dependencies), it would be a practical pre-requisite to 
have two separate classpaths for them.

Does tez have a clean classpath separation between the framework (tez and 
hadoop) and the user code? It appears to me that both the tez code and the user 
code can be mixed up in the $PWD. If that is the case, we would need clean 
physical separation of classpaths. Also, it would help to have a clear 
demarcation of code regions that run user code on behalf of users.

> Support classloader isolation
> -
>
> Key: TEZ-3338
> URL: https://issues.apache.org/jira/browse/TEZ-3338
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>
> HADOOP-10893 and MAPREDUCE-1700 provide classloader isolation at both client 
> side and container side for MR. We should add the same support for Tez. Given 
> we use hadoop command to launch Tez, it appears the client side has been 
> taken care of. Only the container side support is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3357) Change TimecachePlugin to return grouped entity ids.

2016-07-19 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384796#comment-15384796
 ] 

Hitesh Shah commented on TEZ-3357:
--

Comments: 
  - TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP - usedNumDagsPerGroup - configs 
do not use camel case. Either "." or "-". 
  -  ConfigurationScope(Scope.AM) - does not make sense here at all. 
  - Why is TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT needed? Can't it 
fall back to the configured value used when writing/creating these groups?
  - Changes in TezUtilsInternal: In retrospect, the grouping logic can belong 
in TezDAGId.java. A couple of points though:
- use just daggroup or groupId and do not reference timeline in docs or var 
names
- s/DAGGROUP/daggroup/
- s/AtsGroup/DagGroup/
 
  - For "TimelineEntityGroupPlugin implements Configurable" - how are we 
certain that YARN will invoke setConf on the plugin? The base class 
TimelineEntityGroupPlugin does not require a plugin to implement methods 
defined as part of the Configurable interface. 
 - What if setConf is never called? 

{code}
conf.get(TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP,
184 
TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT).split(",");
{code}
   - maybe use conf.getStrings() ? 

{code}
   * Comma separated list of numDagsPerGroup used until now. This is used by 
the TimelineCacheService to generate
1233   * TimelineEntityGroupIds. Do not add too many here, it will affect 
the performance of ATS during query time.
{code}
   - Needs a bit more clarity on what this config implies or maybe a reference 
to the write path where this will be used? 

TestTimelineCachePluginImpl needs tests for where 
TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT is configured to empty i.e. 
 or more than one value i.e. "50,100".

{code}
for (int i = 0; i < usedNumGroups.length; ++i) {
187   allNumGroupsPerDag[i] = Integer.parseInt(usedNumGroups[i]);
188 }
{code}
   - what if someone sets an invalid value? The stack trace without an 
accompaning message of which config property would make it tough for a user to 
figure out what is wrong. 





> Change TimecachePlugin to return grouped entity ids.
> 
>
> Key: TEZ-3357
> URL: https://issues.apache.org/jira/browse/TEZ-3357
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
> Attachments: TEZ-3357.01.patch
>
>
> TimelineCachePlugin has to return TimelineEntityGroupId grouped based on dag 
> id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-3355) Tez Custom Shuffle Handler POC

2016-07-19 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles reassigned TEZ-3355:


Assignee: Jonathan Eagles

> Tez Custom Shuffle Handler POC
> --
>
> Key: TEZ-3355
> URL: https://issues.apache.org/jira/browse/TEZ-3355
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3355.1.patch
>
>
> This jira is effectively a starter jira to port the mapreduce shuffle handler 
> into the tez namespace.
> Answers needed for this jira to be finished:
>  - how to package this artifact without polluting the tez runtime distribution
>  - what directory/artifact to place this code in
>  - how to minimize reliance on hadoop internals
>  - what to do with previous port in tez-ext-service-tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3365) Tez Shuffle Bench

2016-07-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-3365:
-
Attachment: shuffle-traffic.png

> Tez Shuffle Bench
> -
>
> Key: TEZ-3365
> URL: https://issues.apache.org/jira/browse/TEZ-3365
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: shuffle-traffic.png
>
>
> Create a tez shuffle bench which can generate arbitrary shuffle data and move 
> it around to create a map of network bottlenecks, bad disks and rack 
> anomalies.
> !shuffle-traffic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3365) Tez Shuffle Bench

2016-07-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-3365:
-
Attachment: (was: shuffle-traffic.png)

> Tez Shuffle Bench
> -
>
> Key: TEZ-3365
> URL: https://issues.apache.org/jira/browse/TEZ-3365
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: shuffle-traffic.png
>
>
> Create a tez shuffle bench which can generate arbitrary shuffle data and move 
> it around to create a map of network bottlenecks, bad disks and rack 
> anomalies.
> !shuffle-traffic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3365) Tez Shuffle Bench

2016-07-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-3365:
-
Attachment: shuffle-traffic.png

> Tez Shuffle Bench
> -
>
> Key: TEZ-3365
> URL: https://issues.apache.org/jira/browse/TEZ-3365
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: shuffle-traffic.png
>
>
> Create a tez shuffle bench which can generate arbitrary shuffle data and move 
> it around to create a map of network bottlenecks, bad disks and rack 
> anomalies.
> !shuffle-traffic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3365) Tez Shuffle Bench

2016-07-19 Thread Gopal V (JIRA)
Gopal V created TEZ-3365:


 Summary: Tez Shuffle Bench
 Key: TEZ-3365
 URL: https://issues.apache.org/jira/browse/TEZ-3365
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Gopal V
Assignee: Gopal V


Create a tez shuffle bench which can generate arbitrary shuffle data and move 
it around to create a map of network bottlenecks, bad disks and rack anomalies.

!shuffle-traffic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3284) Synchronization for every write in UnorderdKVWriter

2016-07-19 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384411#comment-15384411
 ] 

Jonathan Eagles commented on TEZ-3284:
--

[~ozawa], this patch is ready for review. The core test failure is a flaky test 
in tez and has been well documented.

> Synchronization for every write in UnorderdKVWriter
> ---
>
> Key: TEZ-3284
> URL: https://issues.apache.org/jira/browse/TEZ-3284
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Gopal V
>Assignee: Jonathan Eagles
>Priority: Critical
>  Labels: Performance
> Attachments: TEZ-3284.1.patch, TEZ-3284.2.patch, TEZ-3284.3.patch
>
>
> {code}
> baos = new ByteArrayOutputStream();
> dos = new DataOutputStream(baos);
> keySerializer.open(dos);
> valSerializer.open(dos);
> {code}
> This is a known performance issue as documented in HADOOP-10694
> Both ByteArrayOutputStream::write() and DataOutputStream::write() have lock 
> prefix calls in them, because they are object synchronized methods.
> Recommended solution is to replicate the Hive NonSync impls similar to 
> HADOOP-10694
> {code}
>  TezTaskRunner [RUNNABLE]
> *** java.io.DataOutputStream.write(byte[], int, int) DataOutputStream.java:107
> org.apache.tez.runtime.library.common.serializer.TezBytesWritableSerialization$TezBytesWritableSerializer.serialize(Writable)
>  TezBytesWritableSerialization.java:123
> org.apache.tez.runtime.library.common.serializer.TezBytesWritableSerialization$TezBytesWritableSerializer.serialize(Object)
>  TezBytesWritableSerialization.java:110
> org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(Object,
>  Object, int) UnorderedPartitionedKVWriter.java:295
> org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(Object,
>  Object) UnorderedPartitionedKVWriter.java:257
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(Object,
>  Object) TezProcessor.java:232
> org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkCommonOperator.collect(BytesWritable,
>  Writable) VectorReduceSinkCommonOperator.java:432
> org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkCommonOperator.process(Object,
>  int) VectorReduceSinkCommonOperator.java:397
> org.apache.hadoop.hive.ql.exec.Operator.forward(Object, ObjectInspector) 
> Operator.java:837
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(Object, 
> int) VectorSelectOperator.java:144
> org.apache.hadoop.hive.ql.exec.Operator.forward(Object, ObjectInspector) 
> Operator.java:837
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(Object, 
> int) VectorFilterOperator.java:121
> org.apache.hadoop.hive.ql.exec.Operator.forward(Object, ObjectInspector) 
> Operator.java:837
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(Object, int) 
> TableScanOperator.java:130
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 
> VectorMapOperator.java:796
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 
> MapRecordSource.java:86
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord() 
> MapRecordSource.java:70
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run() 
> MapRecordProcessor.java:361
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(Map,
>  Map) TezProcessor.java:172
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(Map, Map) 
> TezProcessor.java:160
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() 
> LogicalIOProcessorRuntimeTask.java:370
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run() 
> TaskRunner2Callable.java:73
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run() 
> TaskRunner2Callable.java:61
> java.security.AccessController.doPrivileged(PrivilegedExceptionAction, 
> AccessControlContext) AccessController.java (native)
> javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction) 
> Subject.java:422
> org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction)
>  UserGroupInformation.java:1657
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() 
> TaskRunner2Callable.java:61
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() 
> TaskRunner2Callable.java:37
> org.apache.tez.common.CallableWithNdc.call() CallableWithNdc.java:36
> java.util.concurrent.FutureTask.run() FutureTask.java:266
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:617
> java.lang.Thread.run() Thread.java:745
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-19 Thread Manuel Godbert (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384408#comment-15384408
 ] 

Manuel Godbert commented on TEZ-3330:
-

Hello, thanks for the patch. I just tested it, it solves the shuffle error but 
not the second issue. The full trace is:

{code}
task:java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:50)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
at 
org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
at 
org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
at 
org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:81)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:280)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.waitForInputReady(OrderedGroupedKVInput.java:176)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.getReader(OrderedGroupedKVInput.java:240)
at 
org.apache.tez.mapreduce.processor.reduce.ReduceProcessor.run(ReduceProcessor.java:130)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}

Regards

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
> Attachments: TEZ-3330.temp.patch
>
>
> I tried running the simple avro M/R job MapredColorCount, that I found in the 
> examples of avro release 1.7.7.
> It failed with the following trace:
> {code}
> errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  Error while doing final merge
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
> ... 6 more
> {code}
> Digging a bit I saw that during shuffle Tez can't access some of the 
> configuration properties of the job. In our example it is the 
> 

Failed: TEZ-3357 PreCommit Build #1861

2016-07-19 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3357
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1861/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4111 lines...]
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-dag
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12818816/TEZ-3357.01.patch
  against master revision a4247a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.rm.TestContainerReuse

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1861//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1861//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
d37bdc6e381c8e96044d8f51439288ae6d623ee6 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources

Error Message:

Wanted but not invoked:
taskSchedulerManagerForTest.taskAllocated(
0,
Mock for TA attempt_0_0001_0_01_03_1,
,
Container: [ContainerId: container_1_0001_01_01, NodeId: host1:0, 
NodeHttpAddress: host1:0, Resource: , Priority: 1, 
Token: null, ]
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1254)

However, there were other interactions with this mock:
taskSchedulerManagerForTest.init(
Configuration: core-default.xml, core-site.xml, yarn-default.xml, 
yarn-site.xml
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1143)

taskSchedulerManagerForTest.setConfig(
Configuration: core-default.xml, core-site.xml, yarn-default.xml, 
yarn-site.xml
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1143)

taskSchedulerManagerForTest.serviceInit(
Configuration: core-default.xml, core-site.xml, yarn-default.xml, 
yarn-site.xml
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1143)

taskSchedulerManagerForTest.start();
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1144)

taskSchedulerManagerForTest.serviceStart();
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1144)

taskSchedulerManagerForTest.instantiateSchedulers(
"host",
0,
"",
Mock for AppContext, hashCode: 1133050397
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1144)

taskSchedulerManagerForTest.getContainerSignatureMatcher();
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1144)

taskSchedulerManagerForTest.getConfig();
-> at 

[jira] [Commented] (TEZ-3357) Change TimecachePlugin to return grouped entity ids.

2016-07-19 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384384#comment-15384384
 ] 

TezQA commented on TEZ-3357:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12818816/TEZ-3357.01.patch
  against master revision a4247a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.rm.TestContainerReuse

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1861//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1861//console

This message is automatically generated.

> Change TimecachePlugin to return grouped entity ids.
> 
>
> Key: TEZ-3357
> URL: https://issues.apache.org/jira/browse/TEZ-3357
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
> Attachments: TEZ-3357.01.patch
>
>
> TimelineCachePlugin has to return TimelineEntityGroupId grouped based on dag 
> id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2016-07-19 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3363:
-
Summary: Delete intermediate data at the vertex level for Shuffle Handler  
(was: API to delete intermediate data at the vertex level for Shuffle Handler)

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3364) Query fetch stats without fetching for Shuffle Handler

2016-07-19 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3364:
-
Summary: Query fetch stats without fetching for Shuffle Handler  (was: API 
to query fetch stats without fetching for Shuffle Handler)

> Query fetch stats without fetching for Shuffle Handler
> --
>
> Key: TEZ-3364
> URL: https://issues.apache.org/jira/browse/TEZ-3364
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>
> It will be nice to query fetch stats so that tasks can better plan its memory 
> usage and decision making on when to merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-07-19 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3362:
-
Summary: Delete intermediate data at DAG level for Shuffle Handler  (was: 
API to delete intermediate data for DAG for Shuffle Handler)

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3345) Diagnostics for a failed AM may not show up on the DAG UI page.

2016-07-19 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384326#comment-15384326
 ] 

Kuhu Shukla commented on TEZ-3345:
--

bq. However, a correct diagnostic message does show on the AHS (and perhaps the 
RM Kuhu Shukla can you confirm this?)

Yes it shows up on the RM's app diagnostics consistent with what is seen on 
AHS. Thanks a lot [~jeagles].

> Diagnostics for a failed AM may not show up on the DAG UI page.
> ---
>
> Key: TEZ-3345
> URL: https://issues.apache.org/jira/browse/TEZ-3345
> Project: Apache Tez
>  Issue Type: Improvement
>  Components: UI
>Reporter: Kuhu Shukla
> Attachments: DagPage_AMDiagnostics.png
>
>
> In a scenario where AM fails on all its attempts, the DAG page does not show 
> the Diagnostics message on the Tez UI "Diagnostics" section. The messageis 
> available on the AHS and could be pulled into the DAG page just like we do 
> for failed vertices.
> I ran a simple Tez example job that was given way too less memory causing the 
> AM to fail:
> {code}
> hadoop jar /home/gs/tez/current/tez-examples-0.7.1.x.x.jar orderedwordcount 
> -Dtez.am.resource.memory.mb=16 -Dtez.am.launch.cmd-opts="-Xmx13m 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -Xloggc:/gc.log"/books /output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3364) API to query fetch stats without fetching for Shuffle Handler

2016-07-19 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created TEZ-3364:


 Summary: API to query fetch stats without fetching for Shuffle 
Handler
 Key: TEZ-3364
 URL: https://issues.apache.org/jira/browse/TEZ-3364
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Jonathan Eagles


It will be nice to query fetch stats so that tasks can better plan its memory 
usage and decision making on when to merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3361) Fetch Multiple Partitions from the Shuffle Handler

2016-07-19 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created TEZ-3361:


 Summary: Fetch Multiple Partitions from the Shuffle Handler
 Key: TEZ-3361
 URL: https://issues.apache.org/jira/browse/TEZ-3361
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Jonathan Eagles


Provide an API that allows for fetching multiple partitions at once from a 
single upstream task. This is to better support auto-reduce parallelism where a 
single downstream task is impersonating several (possibly?) consecutive 
downstream tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3363) API to delete intermediate data at the vertex level for Shuffle Handler

2016-07-19 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created TEZ-3363:


 Summary: API to delete intermediate data at the vertex level for 
Shuffle Handler
 Key: TEZ-3363
 URL: https://issues.apache.org/jira/browse/TEZ-3363
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Jonathan Eagles


For applications like pig where processing times can be very long, applications 
may choose to delete intermediate data for a sub dag. For example if a DAG has 
synced data to HDFS, all upstream intermediate data can be safely deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3362) API to delete intermediate data for DAG for Shuffle Handler

2016-07-19 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created TEZ-3362:


 Summary: API to delete intermediate data for DAG for Shuffle 
Handler
 Key: TEZ-3362
 URL: https://issues.apache.org/jira/browse/TEZ-3362
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Jonathan Eagles


Applications like hive that use tez in session mode need the ability to delete 
intermediate data after a DAG completes and while the application continues to 
run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3360) Tez Custom Shuffle Handler Documentation

2016-07-19 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created TEZ-3360:


 Summary: Tez Custom Shuffle Handler Documentation
 Key: TEZ-3360
 URL: https://issues.apache.org/jira/browse/TEZ-3360
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Jonathan Eagles


Provide setup instructions and API documentation for the Tez Custom Shuffle 
Handler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3345) Diagnostics for a failed AM may not show up on the DAG UI page.

2016-07-19 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384261#comment-15384261
 ] 

Jonathan Eagles commented on TEZ-3345:
--

[~Sreenath], I had a look at this issue so I wanted to add my thoughts. This is 
a new improvement that will improve the Tez UI. Let me describe the scenario.

The AM is responsible for writing the diagnostic message to ATS. In the case 
that [~kshukla] has documented above, the AM fails in such a way that it is 
unable to write the diagnostic message to ATS. However, a correct diagnostic 
message does show on the AHS (and perhaps the RM [~kshukla] can you confirm 
this?). This jira is really asking to provide a fallback to show a final 
diagnostic message for when this is missing from the ATS.

> Diagnostics for a failed AM may not show up on the DAG UI page.
> ---
>
> Key: TEZ-3345
> URL: https://issues.apache.org/jira/browse/TEZ-3345
> Project: Apache Tez
>  Issue Type: Improvement
>  Components: UI
>Reporter: Kuhu Shukla
> Attachments: DagPage_AMDiagnostics.png
>
>
> In a scenario where AM fails on all its attempts, the DAG page does not show 
> the Diagnostics message on the Tez UI "Diagnostics" section. The messageis 
> available on the AHS and could be pulled into the DAG page just like we do 
> for failed vertices.
> I ran a simple Tez example job that was given way too less memory causing the 
> AM to fail:
> {code}
> hadoop jar /home/gs/tez/current/tez-examples-0.7.1.x.x.jar orderedwordcount 
> -Dtez.am.resource.memory.mb=16 -Dtez.am.launch.cmd-opts="-Xmx13m 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -Xloggc:/gc.log"/books /output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3345) Diagnostics for a failed AM may not show up on the DAG UI page.

2016-07-19 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3345:
-
Issue Type: Improvement  (was: Bug)

> Diagnostics for a failed AM may not show up on the DAG UI page.
> ---
>
> Key: TEZ-3345
> URL: https://issues.apache.org/jira/browse/TEZ-3345
> Project: Apache Tez
>  Issue Type: Improvement
>  Components: UI
>Reporter: Kuhu Shukla
> Attachments: DagPage_AMDiagnostics.png
>
>
> In a scenario where AM fails on all its attempts, the DAG page does not show 
> the Diagnostics message on the Tez UI "Diagnostics" section. The messageis 
> available on the AHS and could be pulled into the DAG page just like we do 
> for failed vertices.
> I ran a simple Tez example job that was given way too less memory causing the 
> AM to fail:
> {code}
> hadoop jar /home/gs/tez/current/tez-examples-0.7.1.x.x.jar orderedwordcount 
> -Dtez.am.resource.memory.mb=16 -Dtez.am.launch.cmd-opts="-Xmx13m 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -Xloggc:/gc.log"/books /output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3357) Change TimecachePlugin to return grouped entity ids.

2016-07-19 Thread Harish Jaiprakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Jaiprakash updated TEZ-3357:
---
Attachment: TEZ-3357.01.patch

[~hitesh] Please review the changes.

> Change TimecachePlugin to return grouped entity ids.
> 
>
> Key: TEZ-3357
> URL: https://issues.apache.org/jira/browse/TEZ-3357
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
> Attachments: TEZ-3357.01.patch
>
>
> TimelineCachePlugin has to return TimelineEntityGroupId grouped based on dag 
> id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3359) Add granular log levels for HistoryLoggingService.

2016-07-19 Thread Harish Jaiprakash (JIRA)
Harish Jaiprakash created TEZ-3359:
--

 Summary: Add granular log levels for HistoryLoggingService.
 Key: TEZ-3359
 URL: https://issues.apache.org/jira/browse/TEZ-3359
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Harish Jaiprakash
Assignee: Harish Jaiprakash


We publish too many events to ATS, this increases the file size for ATS. Reduce 
data size logged into ATS by:

* Having a more granular control over the loglevel, disable task level logs, or 
all logs and so on.
* Disable logging counters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3358) Group ATSLogs for multiple DAGs into one file.

2016-07-19 Thread Harish Jaiprakash (JIRA)
Harish Jaiprakash created TEZ-3358:
--

 Summary: Group ATSLogs for multiple DAGs into one file.
 Key: TEZ-3358
 URL: https://issues.apache.org/jira/browse/TEZ-3358
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Harish Jaiprakash
Assignee: Harish Jaiprakash


Currently we create one history log file per DAG, change to use one group for 
multiple DAGs to prevent creation of too many files on hdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3357) Change TimecachePlugin to return grouped entity ids.

2016-07-19 Thread Harish Jaiprakash (JIRA)
Harish Jaiprakash created TEZ-3357:
--

 Summary: Change TimecachePlugin to return grouped entity ids.
 Key: TEZ-3357
 URL: https://issues.apache.org/jira/browse/TEZ-3357
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Harish Jaiprakash
Assignee: Harish Jaiprakash


TimelineCachePlugin has to return TimelineEntityGroupId grouped based on dag id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-3345) Diagnostics for a failed AM may not show up on the DAG UI page.

2016-07-19 Thread Sreenath Somarajapuram (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379478#comment-15379478
 ] 

Sreenath Somarajapuram edited comment on TEZ-3345 at 7/19/16 10:38 AM:
---

Hi [~kshukla]
- Please share the exact version of Tez that you are using.
- FYI: For Tez we relies on timeline server (ATS) and not AHS for data. Also 
the UI in version 0.7.1(That you are probably using) of Tez does display the 
diagnostics information.

PS: Feel free to tryout the latest UI. Tez is currently at 0.9.0. If interested 
this read-me must help you : 
https://github.com/apache/tez/blob/master/tez-ui/README.md


was (Author: sreenath):
Hi [~kshukla]
- Please share the exact version of Tez that you are using.
- FYI: For Tez we use timeline server (ATS) and not AHS. And the UI in version 
0.7.1(That you are probably using) of Tez does display the diagnostics 
information.

PS: Feel free to tryout the latest UI. Tez is currently at 0.9.0. If interested 
this read-me must help you : 
https://github.com/apache/tez/blob/master/tez-ui/README.md

> Diagnostics for a failed AM may not show up on the DAG UI page.
> ---
>
> Key: TEZ-3345
> URL: https://issues.apache.org/jira/browse/TEZ-3345
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Kuhu Shukla
> Attachments: DagPage_AMDiagnostics.png
>
>
> In a scenario where AM fails on all its attempts, the DAG page does not show 
> the Diagnostics message on the Tez UI "Diagnostics" section. The messageis 
> available on the AHS and could be pulled into the DAG page just like we do 
> for failed vertices.
> I ran a simple Tez example job that was given way too less memory causing the 
> AM to fail:
> {code}
> hadoop jar /home/gs/tez/current/tez-examples-0.7.1.x.x.jar orderedwordcount 
> -Dtez.am.resource.memory.mb=16 -Dtez.am.launch.cmd-opts="-Xmx13m 
> -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -Xloggc:/gc.log"/books /output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3209) Support for fair custom data routing

2016-07-19 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383824#comment-15383824
 ] 

Siddharth Seth commented on TEZ-3209:
-

[~mingma] - apologies for the delay in responding, again :| .

Most of the functionality that's pulled out seems to be around scheduling.
I believe the new VertexManager that is being written is primarily targeted 
towards Unordered Data? Consumers can potentially complete before all produces 
have generated data (or even started). e.g. the case where a single partition 
from different sources is going to different destination tasks. The moment one 
source completes - the corresponding destination is also ready to start and 
complete. At some point, would we want to use a different slow-start / 
scheduling policy in this VertexManager.
Should a different strategy be employed to determine when to trigger 
parallelism determination in this case?

Use the current Shuffle config parameter names, or define new ones for the new 
VertexManager. This shouldn't really get in the way of the refactor if the 
current concepts are retained. My vote would be for separate config parameter 
names.

If the plan is to eventually move to a different set of scheduling strategies - 
I suspect a lot of the code in ShuffleVMBase will go away.

> Support for fair custom data routing
> 
>
> Key: TEZ-3209
> URL: https://issues.apache.org/jira/browse/TEZ-3209
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: TEZ-3209.patch, Tez-based demuxer for highly skewed 
> category data.pdf
>
>
> This is based on offline discussion with [~gopalv], [~hitesh], 
> [~jrottinghuis] and [~lohit] w.r.t. the support for efficient processing of 
> highly skewed unordered partitioned mapper output. Our use case is to demux 
> highly skewed unordered category data partitioned by category name. Gopal and 
> Hitesh mentioned dynamically shuffled join scenario.
> One option we discussed is to leverage auto-parallelism feature with upfront 
> over-partitioning. That means possible overhead to support large number 
> partitions and unnecessary data movement as each reducer needs to get data 
> from all mappers. 
> Another alternative is to use custom {{DataMovementType}} which doesn't 
> require each reducer to fetch data from all mappers. In that way, a large 
> partition will be processed by several reducers, each of which will fetch 
> data from a portion of mappers.
> For example, say there are 100 mappers each of which has 10 partitions (P1, 
> ..., P10). Each mapper generates 100MB for its P10 and 1MB for each of its 
> (P1, ... P9). The default SCATTER_GATHER routing means the reducer for P10 
> has to process 10GB of input and becomes the bottleneck of the job. With the 
> fair custom data routing, The P10 belonging to the first 10 mappers will be 
> processed by one reducer with 1GB input data. The P10 belonging to the second 
> 10 mappers will be processed by another reducer, etc.
> For further optimization, we can allocate the reducer on the same nodes as 
> the mappers that it fetches data from.
> To support this, we need TEZ-3206 as well as customized data routing based on 
> {{VertexManagerPlugin}} and {{EdgeManagerPluginOnDemand}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)