[GitHub] [spark] maropu commented on a change in pull request #22204: [SPARK-25196][SQL] Moves some functions from AnalyzeColumnCommand to command/CommandUtils

GitBox Fri, 08 Mar 2019 14:10:03 -0800

maropu commented on a change in pull request #22204: [SPARK-25196][SQL] Moves 
some functions from AnalyzeColumnCommand to command/CommandUtils
URL: https://github.com/apache/spark/pull/22204#discussion_r263947326


 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ##########
 @@ -95,80 +95,35 @@ case class AnalyzeColumnCommand(
     columnsToAnalyze
   }
 
-  /**
-   * Compute stats for the given columns.
-   * @return (row count, map from column name to CatalogColumnStats)
-   */
-  private def computeColumnStats(
-      sparkSession: SparkSession,
-      relation: LogicalPlan,
-      columns: Seq[Attribute]): (Long, Map[String, CatalogColumnStat]) = {
-    val conf = sparkSession.sessionState.conf
-
-    // Collect statistics per column.
-    // If no histogram is required, we run a job to compute basic column stats 
such as
-    // min, max, ndv, etc. Otherwise, besides basic column stats, histogram 
will also be
-    // generated. Currently we only support equi-height histogram.
-    // To generate an equi-height histogram, we need two jobs:
-    // 1. compute percentiles p(0), p(1/n) ... p((n-1)/n), p(1).
-    // 2. use the percentiles as value intervals of bins, e.g. [p(0), p(1/n)],
-    // [p(1/n), p(2/n)], ..., [p((n-1)/n), p(1)], and then count ndv in each 
bin.
-    // Basic column stats will be computed together in the second job.
-    val attributePercentiles = computePercentiles(columns, sparkSession, 
relation)
-
-    // The first element in the result will be the overall row count, the 
following elements
-    // will be structs containing all column stats.
-    // The layout of each struct follows the layout of the ColumnStats.
-    val expressions = Count(Literal(1)).toAggregateExpression() +:
-      columns.map(statExprs(_, conf, attributePercentiles))
-
-    val namedExpressions = expressions.map(e => Alias(e, e.toString)())
-    val statsRow = new QueryExecution(sparkSession, Aggregate(Nil, 
namedExpressions, relation))
-      .executedPlan.executeTake(1).head
+  private def analyzeColumnInCatalog(sparkSession: SparkSession): Unit = {
 
 Review comment:
   oh, I missed. I'll remove.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on a change in pull request #22204: [SPARK-25196][SQL] Moves some functions from AnalyzeColumnCommand to command/CommandUtils

Reply via email to