Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/22232
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/22232
test this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/22232
this seems to be caused by removing support for Hadoop 2.5 and earlier? cc
original authors @cloud-fan @srowen to make sure
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/22232
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/21147
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/21147
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/21147#discussion_r184308017
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
---
@@ -392,13 +392,13 @@ case
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20611#discussion_r181538566
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -304,45 +304,14 @@ case class LoadDataCommand
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/21052#discussion_r181378148
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
---
@@ -357,6 +357,17 @@ class
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/21052#discussion_r181380894
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/CBOSuite.scala ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/21052#discussion_r181381874
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
---
@@ -395,27 +395,28 @@ case
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/21052#discussion_r181378031
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
---
@@ -357,6 +357,17 @@ class
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/21052
@mshtelma Usually we describe PR using two sections: `What changes were
proposed in this pull request?` and `How was this patch tested?`. I think it
should be in the template when we open a PR. Could
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/21052
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20560#discussion_r180389105
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantSortsSuite.scala
---
@@ -0,0 +1,93 @@
+/*
+ * Licensed
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20560#discussion_r180390907
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -197,6 +198,19 @@ class PlannerSuite extends SharedSQLContext
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20560#discussion_r180389667
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -733,6 +735,17 @@ object EliminateSorts extends Rule
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20560#discussion_r180390730
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -197,6 +198,19 @@ class PlannerSuite extends SharedSQLContext
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20560#discussion_r180390139
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
---
@@ -522,6 +524,8 @@ case class Range
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20560#discussion_r180389285
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantSortsSuite.scala
---
@@ -0,0 +1,93 @@
+/*
+ * Licensed
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r180345994
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -84,19 +84,50 @@ object ReorderJoin extends Rule
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r180346211
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -84,19 +84,49 @@ object ReorderJoin extends Rule
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r180327289
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
---
@@ -172,17 +174,20 @@ object ExtractFiltersAndInnerJoins
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r180344569
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala
---
@@ -59,12 +75,7 @@ class JoinOptimizationSuite
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r180328615
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala
---
@@ -145,4 +161,55 @@ class
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r180340667
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala
---
@@ -46,6 +48,20 @@ class JoinOptimizationSuite
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r180344118
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala
---
@@ -116,7 +127,12 @@ class
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/20611
@sujith71955 any updates?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/20913
@mshtelma Would you please update the branch? seems there's something wrong
with the commits.
---
-
To unsubscribe, e
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20611#discussion_r178736198
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -385,7 +385,9 @@ case class LoadDataCommand(
val
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/20611
I'm ok with the change. Since it's a behavior change of Spark, let's double
check with @gatorsmile and @jiangxb1987 .
@sujith71955 Please improve PR's description, there a
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r175725372
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -84,19 +84,49 @@ object ReorderJoin extends Rule
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r175696570
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
---
@@ -141,14 +141,16 @@ object ExtractEquiJoinKeys extends
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r175727668
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala
---
@@ -145,4 +159,15 @@ class
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r175696187
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
---
@@ -172,17 +174,23 @@ object ExtractFiltersAndInnerJoins
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20345#discussion_r175696302
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
---
@@ -172,17 +174,23 @@ object ExtractFiltersAndInnerJoins
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20611#discussion_r174771588
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -385,8 +385,12 @@ case class LoadDataCommand(
val
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20611#discussion_r174773206
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -385,8 +385,12 @@ case class LoadDataCommand(
val
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/20611
@sujith71955 In the tests, why case2 has less data than case1?
'/tmp/hive/dat*/*' has more files than '/tmp/hive/dat
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/20611
@sujith71955
1. It's a little difficult to read as the pictures have different
resolution. Maybe you can use ``` to include test results? I think this is more
readable. For ex
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/20611
Could you provide the test result in Hive here?
Also, does hive allow wildcard in dir level, or just file level?
---
-
To
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20611#discussion_r172716633
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -385,8 +385,12 @@ case class LoadDataCommand(
val
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/20430
Can we specialize this CTAS case? For data changing commands like INSERT, I
think we should remove the stats if auto update is disabled, because the
previous stats are inaccurate after the insertion
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20430#discussion_r165349231
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala
---
@@ -34,16 +34,12 @@ object CommandUtils extends Logging
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/14129
@cloud-fan I think this PR is to implement Hive's `histogram_numeric`
function. It produces a histogram to approximate data distribution. It's
different from standard equi-width or e
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20072#discussion_r159135028
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -261,6 +261,17 @@ object SQLConf {
.booleanConf
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20072#discussion_r159134987
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -261,6 +261,17 @@ object SQLConf {
.booleanConf
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20072#discussion_r159135036
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
---
@@ -60,6 +60,8 @@ case class HadoopFsRelation
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20072#discussion_r159135272
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
---
@@ -82,7 +84,15 @@ case class HadoopFsRelation
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/20122
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user wzhfy opened a pull request:
https://github.com/apache/spark/pull/20122
[Minor][TEST] remove redundant `EliminateSubqueryAliases` in test code
## What changes were proposed in this pull request?
The `analyze` method in `implicit class DslLogicalPlan` already
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20062#discussion_r159016484
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
---
@@ -225,17 +224,17 @@ case
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/20102
cc @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/20062
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user wzhfy opened a pull request:
https://github.com/apache/spark/pull/20102
[SPARK-22917][SQL] Should not try to generate histogram for empty/null
columns
## What changes were proposed in this pull request?
For empty/null column, the result of
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/20062#discussion_r158806400
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
---
@@ -253,7 +252,7 @@ case class
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/20062
cc @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user wzhfy opened a pull request:
https://github.com/apache/spark/pull/20062
[SPARK-22892] [SQL] Simplify some estimation logic by using double instead
of decimal
## What changes were proposed in this pull request?
Simplify some estimation logic by using double
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19594#discussion_r157699245
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala
---
@@ -191,8 +191,19 @@ case class
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19594#discussion_r157698793
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala
---
@@ -225,6 +236,43 @@ case class
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19594#discussion_r157696227
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala
---
@@ -225,6 +236,43 @@ case class
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19594
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19594
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19594#discussion_r157331840
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
---
@@ -114,4 +115,183 @@ object
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19594#discussion_r157331711
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala
---
@@ -191,8 +191,16 @@ case class
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19594#discussion_r156847785
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala
---
@@ -191,8 +191,16 @@ case class
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19594#discussion_r156847046
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
---
@@ -114,4 +115,183 @@ object
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19594#discussion_r156846872
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
---
@@ -114,4 +115,183 @@ object
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19855
@maropu Good to know, thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19952#discussion_r156551478
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
---
@@ -147,65 +139,78 @@ object
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19952#discussion_r156552208
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
---
@@ -574,51 +539,90 @@ case
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19952#discussion_r156549416
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
---
@@ -147,65 +139,78 @@ object
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19952#discussion_r156549603
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
---
@@ -147,65 +139,78 @@ object
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19952#discussion_r156550872
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
---
@@ -574,51 +539,90 @@ case
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r156546885
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
---
@@ -1021,8 +998,38 @@ private[hive] object HiveClientImpl
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r156546859
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -213,6 +213,27 @@ class StatisticsSuite extends
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19594
ping @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r155936167
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -353,15 +374,6 @@ class StatisticsSuite extends
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r155936087
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -213,6 +213,29 @@ class StatisticsSuite extends
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19932#discussion_r155921370
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
---
@@ -413,32 +413,7 @@ private[hive] class HiveClientImpl
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19932
cc @cloud-fan @gatorsmile
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19594
retest this please..
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19594
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user wzhfy opened a pull request:
https://github.com/apache/spark/pull/19932
[SPARK-22745][SQL] read partition stats from Hive
## What changes were proposed in this pull request?
Currently Spark can read table stats (e.g. `totalSize, numRows`) from Hive,
we can also
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19594
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19594#discussion_r155910267
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala
---
@@ -67,6 +68,205 @@ class
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19594#discussion_r155910232
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala
---
@@ -67,6 +68,205 @@ class
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19783#discussion_r155692778
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
---
@@ -359,7 +371,7 @@ class
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19783#discussion_r155691788
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
---
@@ -529,6 +570,56 @@ case class
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19783#discussion_r155690722
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
---
@@ -332,8 +332,41 @@ case class
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19880
cc @cloud-fan @wangyum
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user wzhfy opened a pull request:
https://github.com/apache/spark/pull/19880
[SPARK-22626][SQL][FOLLOWUP] improve documentation and simplify test case
## What changes were proposed in this pull request?
The reason why some Hive tables have `numRows` statistics is
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19831#discussion_r154499581
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
---
@@ -1187,6 +1187,22 @@ class HiveQuerySuite extends
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19831
@cloud-fan Yes, Spark doesn't allow user to set (Spark's) statistics
manually.
This PR treats 0 row count of **Hive's stats**, it doesn't affect the logic
for Spark'
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19855
@gengliangwang @cloud-fan Previously this rule is in the batch `Operator
Optimizations`, but after
[SPARK-14781](https://github.com/apache/spark/pull/12820), it is moved into a
separate batch [by
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19783#discussion_r154270654
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
---
@@ -114,4 +114,197 @@ object
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19831
Since Hive can't protect user to set a wrong stats properties, I think this
solution can alleviate the problem. Besides, it's consistent with what we do
for `totalSize and rawDataSize` (on
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19831#discussion_r154250160
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
---
@@ -418,7 +418,7 @@ private[hive] class HiveClientImpl
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19831
> Besides, if the size stats totalSize or rawDataSize is wrong, the problem
exists whether CBO is enabled or not.
> If CBO enabled, the outputRowCount == 0, the getOutputSiz
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19783#discussion_r154248775
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala
---
@@ -114,4 +114,197 @@ object
1 - 100 of 1299 matches
Mail list logo