[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

rxin Wed, 21 Nov 2018 01:29:00 -0800

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235309483
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
    @@ -648,7 +648,11 @@ class SparkSession private(
        * @since 2.0.0
        */
       def sql(sqlText: String): DataFrame = {
    -    Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText))
    +    val tracker = new QueryPlanningTracker
    --- End diff --
    
    I don't think it makes sense to add random flags for everything. If the 
argument is that this change has a decent chance of introducing regressions 
(e.g. due to higher memory usage, or cpu overhead), then it would make a lot of 
sense to put it behind a flag so it can be disabled in production if that 
happens.
    
    That said, the overhead on the hot code path here is substantially smaller 
than even transforming the simplest Catalyst plan (hash map look up is orders 
of magnitude cheaper than calling a partial function to transform a Scala 
collection for TreeNode), so I think the risk is so low that it does not 
warrant adding a config.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Reply via email to