[GitHub] spark pull request #22502: [SPARK-25474][SQL]When the "fallBackToHdfsForStat...

2018-11-05 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22502#discussion_r230811926
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
 ---
@@ -86,10 +89,28 @@ case class HadoopFsRelation(
   }
 
   override def sizeInBytes: Long = {
--- End diff --

Hi @wangyum , The issue here  is catalogFileIndex always take stats as 
default stats and it never gets updated, even if the user enable 
'fallBackToHdfsForStats'
So, In this fix, if the user enable the 'fallBackToHdfsForStats', it reads 
the sizeInBytes from the fileSystem, rather than relying on the default table 
stats.
Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22502: [SPARK-25474][SQL]When the "fallBackToHdfsForStat...

2018-11-05 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22502#discussion_r230734089
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
 ---
@@ -86,10 +89,28 @@ case class HadoopFsRelation(
   }
 
   override def sizeInBytes: Long = {
--- End diff --

May be you need to implement a rule similar to `DetermineTableStats` for 
the datasource table?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org