[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks Rui for the review. > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: 3.0.0 > > Attachments: Completed Stages.png, HIVE-18368.1.patch, > HIVE-18368.2.patch, HIVE-18368.3.patch, HIVE-18368.4.patch, Job Ids.png, > Stage DAG 1.png, Stage DAG 2.png > > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Attachment: HIVE-18368.4.patch > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: Completed Stages.png, HIVE-18368.1.patch, > HIVE-18368.2.patch, HIVE-18368.3.patch, HIVE-18368.4.patch, Job Ids.png, > Stage DAG 1.png, Stage DAG 2.png > > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Attachment: (was: HIVE-18368.3.patch) > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: Completed Stages.png, HIVE-18368.1.patch, > HIVE-18368.2.patch, HIVE-18368.3.patch, Job Ids.png, Stage DAG 1.png, Stage > DAG 2.png > > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Attachment: HIVE-18368.3.patch > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: Completed Stages.png, HIVE-18368.1.patch, > HIVE-18368.2.patch, HIVE-18368.3.patch, Job Ids.png, Stage DAG 1.png, Stage > DAG 2.png > > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Attachment: HIVE-18368.3.patch > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: Completed Stages.png, HIVE-18368.1.patch, > HIVE-18368.2.patch, HIVE-18368.3.patch, Job Ids.png, Stage DAG 1.png, Stage > DAG 2.png > > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Attachment: Stage DAG 2.png Stage DAG 1.png Job Ids.png Completed Stages.png > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: Completed Stages.png, HIVE-18368.1.patch, > HIVE-18368.2.patch, Job Ids.png, Stage DAG 1.png, Stage DAG 2.png > > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Attachment: (was: Spark UI - Named RDDs.png) > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HIVE-18368.1.patch, HIVE-18368.2.patch > > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Attachment: HIVE-18368.2.patch Fixing checkstyle issue > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-18368.1.patch, HIVE-18368.2.patch, Spark UI - Named > RDDs.png > > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Status: Patch Available (was: Open) > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-18368.1.patch, Spark UI - Named RDDs.png > > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Attachment: HIVE-18368.1.patch > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-18368.1.patch, Spark UI - Named RDDs.png > > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Attachment: Spark UI - Named RDDs.png > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: Spark UI - Named RDDs.png > > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph
[ https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-18368: Summary: Improve Spark Debug RDD Graph (was: Improve SparkPlan Graph) > Improve Spark Debug RDD Graph > - > > Key: HIVE-18368 > URL: https://issues.apache.org/jira/browse/HIVE-18368 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > The {{SparkPlan}} class does some logging to show the mapping between > different {{SparkTran}}, what shuffle types are used, and what trans are > cached. However, there is room for improvement. > When debug logging is enabled the RDD graph is logged, but there isn't much > information printed about each RDD. > We should combine both of the graphs and improve them. We could even make the > Spark Plan graph part of the {{EXPLAIN EXTENDED}} output. > Ideally, the final graph shows a clear relationship between Tran objects, > RDDs, and BaseWorks. Edge should include information about number of > partitions, shuffle types, Spark operations used, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)