[ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18368:
--------------------------------
    Description: 
The {{SparkPlan}} class does some logging to show the mapping between different 
{{SparkTran}}, what shuffle types are used, and what trans are cached. However, 
there is room for improvement.

When debug logging is enabled the RDD graph is logged, but there isn't much 
information printed about each RDD.

We should combine both of the graphs and improve them. We could even make the 
Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.

Ideally, the final graph shows a clear relationship between Tran objects, RDDs, 
and BaseWorks. Edge should include information about number of partitions, 
shuffle types, Spark operations used, etc.

  was:
The {{SparkPlan}} class does some logging to show the mapping between different 
{{SparkTran}}s, what shuffle types are used, and what trans are cached. 
However, there is room for improvement.

When debug logging is enabled the RDD graph is logged, but there isn't much 
information printed about each RDD.

We should combine both of the graphs and improve them. We could even make the 
Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.

Ideally, the final graph shows a clear relationship between Tran objects, RDDs, 
and BaseWorks. Edge should include information about number of partitions, 
shuffle types, Spark operations used, etc.


> Improve SparkPlan Graph
> -----------------------
>
>                 Key: HIVE-18368
>                 URL: https://issues.apache.org/jira/browse/HIVE-18368
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to