[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-10-05 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Attachment: TEZ-3362.008.patch

Minor change to rename the {{nodeIdIntegerMap}} to {{nodeIdShufflePortMap}}.

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch, 
> TEZ-3362.006.patch, TEZ-3362.007.patch, TEZ-3362.008.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-10-04 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Attachment: TEZ-3362.007.patch

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch, 
> TEZ-3362.006.patch, TEZ-3362.007.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-10-04 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Attachment: (was: TEZ-3362.007.patch)

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch, TEZ-3362.006.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-10-04 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Attachment: TEZ-3362.007.patch

Updated patch to delete the DAG only from ContainerLanchers(s), so that 
AMNodeTracker is not involved. This removes the need for event notification to 
the AMNode.

{{ContainerLauncherWrapper}} needs more work so that we don't do {{instanceof}} 
checks but it would be nice to handle it once and for all in the follow up JIRA 
for generic service plugin design which should move the {{dagComplete}} to 
{{ContainerLauncher}} abstract class. Also, currently the health of the node is 
not checked before deletion which would need exposing {{isUsable}} or similar 
methods through the {{containerlauncherContext}}. Added as TODO.
Additionally, took out the {{DagDeleteRunnable}} piece to a new class which 
should evolve in subsequent JIRAs.

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch, 
> TEZ-3362.006.patch, TEZ-3362.007.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-30 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Attachment: TEZ-3362.006.patch

Addressed the 2 comments on the patch by [~sseth]. Needed to take out the dagId 
from the threadname initialization. Also added a check to establish connection 
only if the node is usable.

Haven't moved the changes to AMNodeTracker to ContainerLauncher(s), not sure if 
this should be done as part of this JIRA or the subsequent ones. Appreciate 
more comments.

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch, TEZ-3362.006.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-29 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Attachment: TEZ-3362.005.patch

Thank you [~jeagles] for the review comments. Addressed those and additionally 
made the thread pool for the deletion executor service to be fixed by a 
configured number of threads, shutdown the executor when serviceStop is called.

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-28 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Attachment: TEZ-3362.004.patch

Adding updated patch addressing [~hitesh]'s comments 1) fully and comment 2) 
partially.

Moved all the deletion code to AMNodeTracker.
We want to have deletion service work for dags and vertices, which would mean 
that we could pass a path to delete and no translation would be needed when the 
HTTP request is received. This implementation does not address that generic api 
and I was thinking of addressing that as a separate JIRA. Long term having a 
plugin instead of AMNodeTracker for DAG/vertex deletion would be better.

Appreciate more comments/review from [~jeagles] and [~hitesh]. Thanks again!

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch, TEZ-3362.004.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-28 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Summary: Delete intermediate data at DAG level for Shuffle Handler  (was: 
hbuktnugghkute)

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-27 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Attachment: TEZ-3362.003.patch

Updated patch with NM Shuffle service port discovery.

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, 
> TEZ-3362.003.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-06 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Attachment: TEZ-3362.002.patch

Patch to make DAG deletion configurable, also remove the hard coded shuffle 
port value and use containerLauncherManager to give that value to the AM. Still 
need some comments on how this would work with external services for container 
launchers.

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-09-01 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3362:
-
Attachment: TEZ-3362.001.patch

Attaching the first version of the patch. DagAppMaster sends the per node dag 
directory delete http request. Currently the shuffle port is hard coded and 
deletion is serial with complexity of O(num_of_nodes * num_of_dirs_per_node).

 [~jeagles], Request for review/comments/improvements. Thanks a lot.

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3362.001.patch
>
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

2016-07-19 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3362:
-
Summary: Delete intermediate data at DAG level for Shuffle Handler  (was: 
API to delete intermediate data for DAG for Shuffle Handler)

> Delete intermediate data at DAG level for Shuffle Handler
> -
>
> Key: TEZ-3362
> URL: https://issues.apache.org/jira/browse/TEZ-3362
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>
> Applications like hive that use tez in session mode need the ability to 
> delete intermediate data after a DAG completes and while the application 
> continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)