[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-02-08 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Rui for the review.

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: Completed Stages.png, HIVE-18368.1.patch, 
> HIVE-18368.2.patch, HIVE-18368.3.patch, HIVE-18368.4.patch, Job Ids.png, 
> Stage DAG 1.png, Stage DAG 2.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-02-05 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Attachment: HIVE-18368.4.patch

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: Completed Stages.png, HIVE-18368.1.patch, 
> HIVE-18368.2.patch, HIVE-18368.3.patch, HIVE-18368.4.patch, Job Ids.png, 
> Stage DAG 1.png, Stage DAG 2.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-22 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Attachment: (was: HIVE-18368.3.patch)

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: Completed Stages.png, HIVE-18368.1.patch, 
> HIVE-18368.2.patch, HIVE-18368.3.patch, Job Ids.png, Stage DAG 1.png, Stage 
> DAG 2.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-22 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Attachment: HIVE-18368.3.patch

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: Completed Stages.png, HIVE-18368.1.patch, 
> HIVE-18368.2.patch, HIVE-18368.3.patch, Job Ids.png, Stage DAG 1.png, Stage 
> DAG 2.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-22 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Attachment: HIVE-18368.3.patch

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: Completed Stages.png, HIVE-18368.1.patch, 
> HIVE-18368.2.patch, HIVE-18368.3.patch, Job Ids.png, Stage DAG 1.png, Stage 
> DAG 2.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-22 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Attachment: Stage DAG 2.png
Stage DAG 1.png
Job Ids.png
Completed Stages.png

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: Completed Stages.png, HIVE-18368.1.patch, 
> HIVE-18368.2.patch, Job Ids.png, Stage DAG 1.png, Stage DAG 2.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-22 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Attachment: (was: Spark UI - Named RDDs.png)

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-18368.1.patch, HIVE-18368.2.patch
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-05 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Attachment: HIVE-18368.2.patch

Fixing checkstyle issue

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-18368.1.patch, HIVE-18368.2.patch, Spark UI - Named 
> RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-04 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Status: Patch Available  (was: Open)

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-18368.1.patch, Spark UI - Named RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-04 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Attachment: HIVE-18368.1.patch

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-18368.1.patch, Spark UI - Named RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-04 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Attachment: Spark UI - Named RDDs.png

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: Spark UI - Named RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18368) Improve Spark Debug RDD Graph

2018-01-04 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:

Summary: Improve Spark Debug RDD Graph  (was: Improve SparkPlan Graph)

> Improve Spark Debug RDD Graph
> -
>
> Key: HIVE-18368
> URL: https://issues.apache.org/jira/browse/HIVE-18368
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)