[jira] [Commented] (TEZ-2368) Make the dag number available in Context classes

2015-04-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515150#comment-14515150
 ] 

Hitesh Shah commented on TEZ-2368:
--

Comments: typo in Get a numeric identifier for the dto which the task belongs 

+1 once the typo is fixed. 

 Make the dag number available in Context classes
 

 Key: TEZ-2368
 URL: https://issues.apache.org/jira/browse/TEZ-2368
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2368.1.txt, TEZ-2368.2.txt


 Provide the dag number, which is a unique number, for each dag running within 
 an application in the TezInputContext, TezOutputContext, TezProcessorContext.
 When containers are re-used, or for external services, this can be used to 
 generate intermediate data to a dag specific directory instead of an 
 application specific directory, where it becomes difficult to differentiate 
 between different dags.
 The DAG name does provide this - but is not suitable for use in a directory 
 name. Hashing the name is an option, but can lead to collisions.
 Generating data into a dag specific directory will eventually only be usable 
 when we move away from the default MR handler, or enhance it to support an 
 additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2368) Make the dag number available in Context classes

2015-04-25 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512779#comment-14512779
 ] 

Hitesh Shah commented on TEZ-2368:
--

Probably overthinking this one too much. dagIdentifier api sounds good. Using 
the dagId/index should be sufficient. 
 

 Make the dag number available in Context classes
 

 Key: TEZ-2368
 URL: https://issues.apache.org/jira/browse/TEZ-2368
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2368.1.txt


 Provide the dag number, which is a unique number, for each dag running within 
 an application in the TezInputContext, TezOutputContext, TezProcessorContext.
 When containers are re-used, or for external services, this can be used to 
 generate intermediate data to a dag specific directory instead of an 
 application specific directory, where it becomes difficult to differentiate 
 between different dags.
 The DAG name does provide this - but is not suitable for use in a directory 
 name. Hashing the name is an option, but can lead to collisions.
 Generating data into a dag specific directory will eventually only be usable 
 when we move away from the default MR handler, or enhance it to support an 
 additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2368) Make the dag number available in Context classes

2015-04-24 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511993#comment-14511993
 ] 

Hitesh Shah commented on TEZ-2368:
--

dag names are meant to be unique within a session. 

 Make the dag number available in Context classes
 

 Key: TEZ-2368
 URL: https://issues.apache.org/jira/browse/TEZ-2368
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth

 Provide the dag number, which is a unique number, for each dag running within 
 an application in the TezInputContext, TezOutputContext, TezProcessorContext.
 When containers are re-used, or for external services, this can be used to 
 generate intermediate data to a dag specific directory instead of an 
 application specific directory, where it becomes difficult to differentiate 
 between different dags.
 The DAG name does provide this - but is not suitable for use in a directory 
 name. Hashing the name is an option, but can lead to collisions.
 Generating data into a dag specific directory will eventually only be usable 
 when we move away from the default MR handler, or enhance it to support an 
 additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2368) Make the dag number available in Context classes

2015-04-24 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512091#comment-14512091
 ] 

TezQA commented on TEZ-2368:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728081/TEZ-2368.1.txt
  against master revision 2935ef4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/535//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/535//console

This message is automatically generated.

 Make the dag number available in Context classes
 

 Key: TEZ-2368
 URL: https://issues.apache.org/jira/browse/TEZ-2368
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2368.1.txt


 Provide the dag number, which is a unique number, for each dag running within 
 an application in the TezInputContext, TezOutputContext, TezProcessorContext.
 When containers are re-used, or for external services, this can be used to 
 generate intermediate data to a dag specific directory instead of an 
 application specific directory, where it becomes difficult to differentiate 
 between different dags.
 The DAG name does provide this - but is not suitable for use in a directory 
 name. Hashing the name is an option, but can lead to collisions.
 Generating data into a dag specific directory will eventually only be usable 
 when we move away from the default MR handler, or enhance it to support an 
 additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2368) Make the dag number available in Context classes

2015-04-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512018#comment-14512018
 ] 

Siddharth Seth commented on TEZ-2368:
-

Yes, but they're not suitable for use in directory names. Similarly for vertex 
names.

 Make the dag number available in Context classes
 

 Key: TEZ-2368
 URL: https://issues.apache.org/jira/browse/TEZ-2368
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth

 Provide the dag number, which is a unique number, for each dag running within 
 an application in the TezInputContext, TezOutputContext, TezProcessorContext.
 When containers are re-used, or for external services, this can be used to 
 generate intermediate data to a dag specific directory instead of an 
 application specific directory, where it becomes difficult to differentiate 
 between different dags.
 The DAG name does provide this - but is not suitable for use in a directory 
 name. Hashing the name is an option, but can lead to collisions.
 Generating data into a dag specific directory will eventually only be usable 
 when we move away from the default MR handler, or enhance it to support an 
 additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2368) Make the dag number available in Context classes

2015-04-24 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512127#comment-14512127
 ] 

Hitesh Shah commented on TEZ-2368:
--

What I meant was that a hashcode would work given that the name is unique. In 
any case, why does a dag number need to be exposed to user code? Isn't the 
unique id sufficient? 

If the end-goal is per-dag data for the framework to be able to clean up code 
then the framework should be creating dag specific dirs before passing them to 
user land code and cleaning up these dirs when the dag has completed.

External services are not meant to be using these context classes in any case. 
Or am I missing something?

We can add this api but I am not sure I see a need for it currently. 



 Make the dag number available in Context classes
 

 Key: TEZ-2368
 URL: https://issues.apache.org/jira/browse/TEZ-2368
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2368.1.txt


 Provide the dag number, which is a unique number, for each dag running within 
 an application in the TezInputContext, TezOutputContext, TezProcessorContext.
 When containers are re-used, or for external services, this can be used to 
 generate intermediate data to a dag specific directory instead of an 
 application specific directory, where it becomes difficult to differentiate 
 between different dags.
 The DAG name does provide this - but is not suitable for use in a directory 
 name. Hashing the name is an option, but can lead to collisions.
 Generating data into a dag specific directory will eventually only be usable 
 when we move away from the default MR handler, or enhance it to support an 
 additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2368) Make the dag number available in Context classes

2015-04-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512249#comment-14512249
 ] 

Siddharth Seth commented on TEZ-2368:
-

This could be renamed to something like getDagIdentifier (short unique integer 
identifying a dag within an app) instead of getDagNumber - which makes it 
cleaner.

 Make the dag number available in Context classes
 

 Key: TEZ-2368
 URL: https://issues.apache.org/jira/browse/TEZ-2368
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2368.1.txt


 Provide the dag number, which is a unique number, for each dag running within 
 an application in the TezInputContext, TezOutputContext, TezProcessorContext.
 When containers are re-used, or for external services, this can be used to 
 generate intermediate data to a dag specific directory instead of an 
 application specific directory, where it becomes difficult to differentiate 
 between different dags.
 The DAG name does provide this - but is not suitable for use in a directory 
 name. Hashing the name is an option, but can lead to collisions.
 Generating data into a dag specific directory will eventually only be usable 
 when we move away from the default MR handler, or enhance it to support an 
 additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2368) Make the dag number available in Context classes

2015-04-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512245#comment-14512245
 ] 

Siddharth Seth commented on TEZ-2368:
-

The hashcode will not be sufficient since it can, and likely will be, different 
across two different JVMs. 
We currently end up generating data in ${appId}/constant/${uniqeId} - which 
makes cleanup very difficult.

The main limitation in changing this path is the way we are tied to the MR 
ShuffleHandler which only knows how to process this path.
Creating dag specific dirs is an option, but only after the ShuffleHandler 
changes.

When an external shuffleHandler is used - this API provides the relevant 
information to create dag specific dirs instead of the app dir directly.
The API isn't exposing the dagId directly. What it does expose is a small 
unique identifer for each dag running in an application - which can be useful. 
Caching would be an alternate use for something like this. It's similar to a 
vertexIndex API which exists on the context impls - which is present for 
exactly the same reason - to generate names.

bq. External services are not meant to be using these context classes in any 
case. Or am I missing something?
External services can use the components in the RuntimeLibrary, all of which 
depend on the Context classes. What that does mean is construction/usage of 
these classes will eventually need to be exposed as a limited public API - 
likely tied to specific Tez versions as it evolves.

 Make the dag number available in Context classes
 

 Key: TEZ-2368
 URL: https://issues.apache.org/jira/browse/TEZ-2368
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2368.1.txt


 Provide the dag number, which is a unique number, for each dag running within 
 an application in the TezInputContext, TezOutputContext, TezProcessorContext.
 When containers are re-used, or for external services, this can be used to 
 generate intermediate data to a dag specific directory instead of an 
 application specific directory, where it becomes difficult to differentiate 
 between different dags.
 The DAG name does provide this - but is not suitable for use in a directory 
 name. Hashing the name is an option, but can lead to collisions.
 Generating data into a dag specific directory will eventually only be usable 
 when we move away from the default MR handler, or enhance it to support an 
 additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)