dongjoon-hyun commented on a change in pull request #24047: [SPARK-25196][SQL] 
Extends Analyze commands for cached tables 
URL: https://github.com/apache/spark/pull/24047#discussion_r264519111
 
 

 ##########
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala
 ##########
 @@ -470,4 +471,34 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
       }
     }
   }
+
+  test("analyzes column statistics in cached query") {
+    withTempView("cachedTempView", "tempView") {
+      spark.sql(
+        """CACHE TABLE cachedTempView AS
+          |  SELECT c0, avg(c1) AS v1, avg(c2) AS v2
+          |  FROM (SELECT id % 3 AS c0, id % 5 AS c1, 2 AS c2 FROM range(1, 
30))
+          |  GROUP BY c0
+        """.stripMargin)
+
+      // Analyzes one column in the cached logical plan
+      spark.sql("ANALYZE TABLE cachedTempView COMPUTE STATISTICS FOR COLUMNS 
v1".stripMargin)
 
 Review comment:
   Hi, @maropu .
   - Could you rebase once more to resolve the conflicts?
   - Can we advertise this SQL statement as a first example in the PR 
description instead of the raw public API?
   Although we add a public API to `CacheManager`, `CacheManager` is still 
documented as an internal to Spark SQL.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to