[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3362: - Attachment: TEZ-3362.008.patch Minor change to rename the {{nodeIdIntegerMap}} to {{nodeIdShufflePortMap}}. > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, > TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch, > TEZ-3362.006.patch, TEZ-3362.007.patch, TEZ-3362.008.patch > > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3362: - Attachment: TEZ-3362.007.patch > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, > TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch, > TEZ-3362.006.patch, TEZ-3362.007.patch > > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3362: - Attachment: (was: TEZ-3362.007.patch) > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, > TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch, TEZ-3362.006.patch > > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3362: - Attachment: TEZ-3362.007.patch Updated patch to delete the DAG only from ContainerLanchers(s), so that AMNodeTracker is not involved. This removes the need for event notification to the AMNode. {{ContainerLauncherWrapper}} needs more work so that we don't do {{instanceof}} checks but it would be nice to handle it once and for all in the follow up JIRA for generic service plugin design which should move the {{dagComplete}} to {{ContainerLauncher}} abstract class. Also, currently the health of the node is not checked before deletion which would need exposing {{isUsable}} or similar methods through the {{containerlauncherContext}}. Added as TODO. Additionally, took out the {{DagDeleteRunnable}} piece to a new class which should evolve in subsequent JIRAs. > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, > TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch, > TEZ-3362.006.patch, TEZ-3362.007.patch > > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3362: - Attachment: TEZ-3362.006.patch Addressed the 2 comments on the patch by [~sseth]. Needed to take out the dagId from the threadname initialization. Also added a check to establish connection only if the node is usable. Haven't moved the changes to AMNodeTracker to ContainerLauncher(s), not sure if this should be done as part of this JIRA or the subsequent ones. Appreciate more comments. > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, > TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch, TEZ-3362.006.patch > > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3362: - Attachment: TEZ-3362.005.patch Thank you [~jeagles] for the review comments. Addressed those and additionally made the thread pool for the deletion executor service to be fixed by a configured number of threads, shutdown the executor when serviceStop is called. > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, > TEZ-3362.003.patch, TEZ-3362.004.patch, TEZ-3362.005.patch > > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3362: - Attachment: TEZ-3362.004.patch Adding updated patch addressing [~hitesh]'s comments 1) fully and comment 2) partially. Moved all the deletion code to AMNodeTracker. We want to have deletion service work for dags and vertices, which would mean that we could pass a path to delete and no translation would be needed when the HTTP request is received. This implementation does not address that generic api and I was thinking of addressing that as a separate JIRA. Long term having a plugin instead of AMNodeTracker for DAG/vertex deletion would be better. Appreciate more comments/review from [~jeagles] and [~hitesh]. Thanks again! > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, > TEZ-3362.003.patch, TEZ-3362.004.patch > > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3362: - Summary: Delete intermediate data at DAG level for Shuffle Handler (was: hbuktnugghkute) > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, > TEZ-3362.003.patch > > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3362: - Attachment: TEZ-3362.003.patch Updated patch with NM Shuffle service port discovery. > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, > TEZ-3362.003.patch > > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3362: - Attachment: TEZ-3362.002.patch Patch to make DAG deletion configurable, also remove the hard coded shuffle port value and use containerLauncherManager to give that value to the AM. Still need some comments on how this would work with external services for container launchers. > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch > > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3362: - Attachment: TEZ-3362.001.patch Attaching the first version of the patch. DagAppMaster sends the per node dag directory delete http request. Currently the shuffle port is hard coded and deletion is serial with complexity of O(num_of_nodes * num_of_dirs_per_node). [~jeagles], Request for review/comments/improvements. Thanks a lot. > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3362.001.patch > > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-3362: - Summary: Delete intermediate data at DAG level for Shuffle Handler (was: API to delete intermediate data for DAG for Shuffle Handler) > Delete intermediate data at DAG level for Shuffle Handler > - > > Key: TEZ-3362 > URL: https://issues.apache.org/jira/browse/TEZ-3362 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles > > Applications like hive that use tez in session mode need the ability to > delete intermediate data after a DAG completes and while the application > continues to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)