[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-19 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22743
  
Yes. you are right, if datasource table stats is empty, 
`DetermineTableStats` doesn't set stats for it, so it's only a problem for hive 
tables.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22743
  
> Datasource table will not cache in tableRelationCache.

I don't think so. Spark caches data source table in `FindDataSourceTable`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-19 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22743
  
Datasource table will not cache in 
[tableRelationCache](https://github.com/apache/spark/blob/01c3dfab158d40653f8ce5d96f57220297545d5b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L134).
Hive table only occured when Hive table stats is empty and enable 
`spark.sql.hive.convertMetastoreParquet` (default value). then when we read 
this table,  Spark will 
[convertToLogicalRelation](https://github.com/apache/spark/blob/a2f502cf53b6b00af7cb80b6f38e64cf46367595/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L116)
 and [cache 
it](https://github.com/apache/spark/blob/a2f502cf53b6b00af7cb80b6f38e64cf46367595/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L207).

Empty stats occured at least in 2 situations:
1. Create as Hive table and enable `spark.sql.hive.convertMetastoreParquet` 
(default value) and disable `spark.sql.statistics.size.autoUpdate.enabled` 
(default value) then do inserting.
2. Table managed by Hive and didn't gather stats.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-19 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22743
  
why it's only a problem for hive tables?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22743
  
This happens when a table `LogicalRelation` has been cached, then we change 
`spark.sql.statistics.fallBackToHdfs` or `spark.sql.defaultSizeInBytes` will 
not have any effect to stats, it always uses the stats already cached in 
`LogicalRelation`. This is an example:

```scala
import org.apache.spark.sql.catalyst.QualifiedTableName
import org.apache.spark.sql.catalyst.catalog.SessionCatalog
import org.apache.spark.sql.execution.datasources.LogicalRelation

spark.sql("CREATE TABLE t1 (c1 bigint) STORED AS PARQUET")
spark.sql("INSERT INTO TABLE t1 VALUES (1)")
spark.sql("REFRESH TABLE t1")

val catalog = spark.sessionState.catalog
val qualifiedTableName = QualifiedTableName(catalog.getCurrentDatabase, 
"t1")

spark.sql("SELECT * from t1").collect()
val cachedRelation = catalog.getCachedTable(qualifiedTableName)

cachedRelation.asInstanceOf[LogicalRelation].catalogTable.get.stats.get.sizeInBytes
// res4: BigInt = 9223372036854775807

spark.sql("set spark.sql.statistics.fallBackToHdfs=true")
spark.sql("SELECT * from t1").collect()
val cachedRelation = catalog.getCachedTable(qualifiedTableName)

cachedRelation.asInstanceOf[LogicalRelation].catalogTable.get.stats.get.sizeInBytes
// res7: BigInt = 9223372036854775807
// It should compute from file system, but still 9223372036854775807

spark.sql("REFRESH TABLE t1")
spark.sql("SELECT * from t1").collect()
val cachedRelation = catalog.getCachedTable(qualifiedTableName)

cachedRelation.asInstanceOf[LogicalRelation].catalogTable.get.stats.get.sizeInBytes
// res10: BigInt = 708
// If we refresh this table, it correct.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22743
  
can you explain more about how this happens?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22743
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22743
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97522/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22743
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22743
  
**[Test build #97522 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97522/testReport)**
 for PR 22743 at commit 
[`206743c`](https://github.com/apache/spark/commit/206743cef96e536783a315785739af16f845f5c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22743
  
**[Test build #97522 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97522/testReport)**
 for PR 22743 at commit 
[`206743c`](https://github.com/apache/spark/commit/206743cef96e536783a315785739af16f845f5c1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22743
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4079/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22743
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22743
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97517/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22743
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22743
  
**[Test build #97517 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97517/testReport)**
 for PR 22743 at commit 
[`c32a2a9`](https://github.com/apache/spark/commit/c32a2a976718fcd1d7c92bb2310e463b7edff478).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22743
  
**[Test build #97517 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97517/testReport)**
 for PR 22743 at commit 
[`c32a2a9`](https://github.com/apache/spark/commit/c32a2a976718fcd1d7c92bb2310e463b7edff478).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22743
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4075/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22743
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22743
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22743
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97515/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22743
  
**[Test build #97515 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97515/testReport)**
 for PR 22743 at commit 
[`c32a2a9`](https://github.com/apache/spark/commit/c32a2a976718fcd1d7c92bb2310e463b7edff478).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22743: [SPARK-25740][SQL] Refactor DetermineTableStats to inval...

2018-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22743
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org