Gerben van der Huizen created SPARK-41322:
---------------------------------------------

             Summary: Optimized query plan cost/statistics overview
                 Key: SPARK-41322
                 URL: https://issues.apache.org/jira/browse/SPARK-41322
             Project: Spark
          Issue Type: Improvement
          Components: GraphX, SQL
    Affects Versions: 3.3.0
            Reporter: Gerben van der Huizen


*Motivation*

Spark SQL supports running the `EXPLAIN COST` statement on a query to show the 
optimized logical plan and its data costs per stage (i.e. statistics) 
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-explain.html. However, 
it can currently be difficult to determine what the total data read cost will 
be for a complex query with many stages. Other query engines such as 
Trino/Presto attempt to provide a general estimate of resource costs of a query 
when running the `EXPLAIN` statement, which includes CPU, memory, row count, 
and data size [https://trino.io/docs/current/optimizer/cost-in-explain.html.]

*Proposal*

We suggested adding an overview/estimation of the total resources that will be 
used within the optimized logical plan of a Spark query, or maybe as an 
alternative, provide this overview/estimation when the `EXPLAIN COST` statement 
is called on a query. As a first version, it would already be beneficial if 
this general cost estimation would include anything that is available within 
the statistics of the optimized query plan, such as:
 * The amount of data the will be read in bytes
 * The total amount of rows 
 * etc.

Given that the optimized logical plan is divided in stages it would already be 
sufficient to show these parameter per stage so they can be aggregated for the 
entire job later on if needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to