[GitHub] [spark] viirya commented on pull request #28662: [SPARK-31850][SQL]Prevent DetermineTableStats from computing stats multiple times for same table

GitBox Thu, 04 Jun 2020 00:35:36 -0700


viirya commented on pull request #28662:
URL: https://github.com/apache/spark/pull/28662#issuecomment-638663484



   I see, it is because calling `executeSameContext` to analyze one logical 
plan during analysis. I'd say that the reproducing steps in the description is 
confusing because it doesn't show any evidence about multiple runs of 
`DetermineTableStats`. I think it is better to describe the issue clearly.
   
   Btw, as we already calculate statistics and save into `HiveTableRelation` in 
`DetermineTableStats`. To prevent redundant calculation, is it much easier and 
simpler to just add new condition like:
   
   ```scala
   class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] {
     override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators 
{
       case relation: HiveTableRelation
         if DDLUtils.isHiveTable(relation.tableMeta) && 
relation.tableMeta.stats.isEmpty  && relation.tableStats.isEmpty =>
         hiveTableWithStats(relation)
       ...
   }
   ```
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #28662: [SPARK-31850][SQL]Prevent DetermineTableStats from computing stats multiple times for same table

Reply via email to