[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...

2018-08-30 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/22232 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...

2018-08-25 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/22232 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...

2018-08-25 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/22232 this seems to be caused by removing support for Hadoop 2.5 and earlier? cc original authors @cloud-fan @srowen to make sure

[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...

2018-08-25 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/22232 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

2018-04-26 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/21147 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.evaluateI...

2018-04-26 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/21147 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21147: [SPARK-23799][SQL][FOLLOW-UP] FilterEstimation.ev...

2018-04-26 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/21147#discussion_r184308017 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -392,13 +392,13 @@ case

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-04-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r181538566 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -304,45 +304,14 @@ case class LoadDataCommand

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

2018-04-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r181378148 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -357,6 +357,17 @@ class

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

2018-04-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r181380894 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/CBOSuite.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

2018-04-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r181381874 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -395,27 +395,28 @@ case

[GitHub] spark pull request #21052: [SPARK-23799] FilterEstimation.evaluateInSet prod...

2018-04-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/21052#discussion_r181378031 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -357,6 +357,17 @@ class

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

2018-04-13 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/21052 @mshtelma Usually we describe PR using two sections: `What changes were proposed in this pull request?` and `How was this patch tested?`. I think it should be in the template when we open a PR. Could

[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

2018-04-13 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/21052 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20560#discussion_r180390907 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -197,6 +198,19 @@ class PlannerSuite extends SharedSQLContext

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20560#discussion_r180389105 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantSortsSuite.scala --- @@ -0,0 +1,93 @@ +/* + * Licensed

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20560#discussion_r180389667 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -733,6 +735,17 @@ object EliminateSorts extends Rule

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20560#discussion_r180390730 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -197,6 +198,19 @@ class PlannerSuite extends SharedSQLContext

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20560#discussion_r180390139 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -522,6 +524,8 @@ case class Range

[GitHub] spark pull request #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Opt...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20560#discussion_r180389285 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantSortsSuite.scala --- @@ -0,0 +1,93 @@ +/* + * Licensed

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180345994 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -84,19 +84,50 @@ object ReorderJoin extends Rule

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180346211 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -84,19 +84,49 @@ object ReorderJoin extends Rule

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180327289 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -172,17 +174,20 @@ object ExtractFiltersAndInnerJoins

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180344569 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala --- @@ -59,12 +75,7 @@ class JoinOptimizationSuite

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180328615 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala --- @@ -145,4 +161,55 @@ class

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180344118 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala --- @@ -116,7 +127,12 @@ class

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-04-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180340667 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala --- @@ -46,6 +48,20 @@ class JoinOptimizationSuite

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-04-10 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/20611 @sujith71955 any updates? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20913: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

2018-04-10 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/20913 @mshtelma Would you please update the branch? seems there's something wrong with the commits. --- - To unsubscribe, e-mail

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-04-03 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r178736198 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -385,7 +385,9 @@ case class LoadDataCommand( val

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-03-27 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/20611 I'm ok with the change. Since it's a behavior change of Spark, let's double check with @gatorsmile and @jiangxb1987 . @sujith71955 Please improve PR's description, there are some wrong letter

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-03-20 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175725372 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -84,19 +84,49 @@ object ReorderJoin extends Rule

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-03-20 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175696570 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -141,14 +141,16 @@ object ExtractEquiJoinKeys extends

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-03-20 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175727668 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala --- @@ -145,4 +159,15 @@ class

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-03-20 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175696187 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -172,17 +174,23 @@ object ExtractFiltersAndInnerJoins

[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...

2018-03-20 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175696302 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -172,17 +174,23 @@ object ExtractFiltersAndInnerJoins

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-03-15 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r174771588 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -385,8 +385,12 @@ case class LoadDataCommand( val

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-03-15 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r174773206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -385,8 +385,12 @@ case class LoadDataCommand( val

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-03-15 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/20611 @sujith71955 In the tests, why case2 has less data than case1? '/tmp/hive/dat*/*' has more files than '/tmp/hive/dat1/type*', right

[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-03-08 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/20611 @sujith71955 1. It's a little difficult to read as the pictures have different resolution. Maybe you can use ``` to include test results? I think this is more readable. For example

[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-03-06 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/20611 Could you provide the test result in Hive here? Also, does hive allow wildcard in dir level, or just file level

[GitHub] spark pull request #20611: [SPARK-23425][SQL]When wild card is been used in ...

2018-03-06 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r172716633 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -385,8 +385,12 @@ case class LoadDataCommand( val

[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

2018-02-01 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/20430 Can we specialize this CTAS case? For data changing commands like INSERT, I think we should remove the stats if auto update is disabled, because the previous stats are inaccurate after the insertion

[GitHub] spark pull request #20430: [SPARK-23263][SQL] Create table stored as parquet...

2018-02-01 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20430#discussion_r165349231 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -34,16 +34,12 @@ object CommandUtils extends Logging

[GitHub] spark issue #14129: [SPARK-16280][SQL] Implement histogram_numeric SQL funct...

2018-01-09 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/14129 @cloud-fan I think this PR is to implement Hive's `histogram_numeric` function. It produces a histogram to approximate data distribution. It's different from standard equi-width or equi-height

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2017-12-31 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159135028 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -261,6 +261,17 @@ object SQLConf { .booleanConf

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2017-12-31 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159134987 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -261,6 +261,17 @@ object SQLConf { .booleanConf

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2017-12-31 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159135036 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala --- @@ -60,6 +60,8 @@ case class HadoopFsRelation

[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...

2017-12-31 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159135272 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala --- @@ -82,7 +84,15 @@ case class HadoopFsRelation

[GitHub] spark issue #20122: [TEST][MINOR] remove redundant `EliminateSubqueryAliases...

2017-12-30 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/20122 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20122: [Minor][TEST] remove redundant `EliminateSubquery...

2017-12-29 Thread wzhfy
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/20122 [Minor][TEST] remove redundant `EliminateSubqueryAliases` in test code ## What changes were proposed in this pull request? The `analyze` method in `implicit class DslLogicalPlan` already

[GitHub] spark pull request #20062: [SPARK-22892] [SQL] Simplify some estimation logi...

2017-12-28 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20062#discussion_r159016484 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -225,17 +224,17 @@ case

[GitHub] spark issue #20102: [SPARK-22917][SQL] Should not try to generate histogram ...

2017-12-28 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/20102 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20062: [SPARK-22892] [SQL] Simplify some estimation logic by us...

2017-12-28 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/20062 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20102: [SPARK-22917][SQL] Should not try to generate his...

2017-12-28 Thread wzhfy
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/20102 [SPARK-22917][SQL] Should not try to generate histogram for empty/null columns ## What changes were proposed in this pull request? For empty/null column, the result

[GitHub] spark pull request #20062: [SPARK-22892] [SQL] Simplify some estimation logi...

2017-12-27 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20062#discussion_r158806400 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -253,7 +252,7 @@ case class

[GitHub] spark issue #20062: [SPARK-22892] [SQL] Simplify some estimation logic by us...

2017-12-23 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/20062 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20062: [SPARK-22892] [SQL] Simplify some estimation logi...

2017-12-23 Thread wzhfy
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/20062 [SPARK-22892] [SQL] Simplify some estimation logic by using double instead of decimal ## What changes were proposed in this pull request? Simplify some estimation logic by using double

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-19 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r157699245 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala --- @@ -191,8 +191,19 @@ case class

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-19 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r157698793 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala --- @@ -225,6 +236,43 @@ case class

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-19 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r157696227 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala --- @@ -225,6 +236,43 @@ case class

[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-18 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19594 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-16 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19594 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-15 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r157331840 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +115,183 @@ object

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-15 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r157331711 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala --- @@ -191,8 +191,16 @@ case class

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r156847785 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala --- @@ -191,8 +191,16 @@ case class

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r156847046 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +115,183 @@ object

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r156846872 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +115,183 @@ object

[GitHub] spark issue #19855: [SPARK-22662] [SQL] Failed to prune columns after rewrit...

2017-12-12 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19855 @maropu Good to know, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19952: [SPARK-21322][SQL][followup] support histogram in...

2017-12-12 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19952#discussion_r156551478 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -147,65 +139,78 @@ object

[GitHub] spark pull request #19952: [SPARK-21322][SQL][followup] support histogram in...

2017-12-12 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19952#discussion_r156552208 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -574,51 +539,90 @@ case

[GitHub] spark pull request #19952: [SPARK-21322][SQL][followup] support histogram in...

2017-12-12 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19952#discussion_r156549416 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -147,65 +139,78 @@ object

[GitHub] spark pull request #19952: [SPARK-21322][SQL][followup] support histogram in...

2017-12-12 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19952#discussion_r156549603 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -147,65 +139,78 @@ object

[GitHub] spark pull request #19952: [SPARK-21322][SQL][followup] support histogram in...

2017-12-12 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19952#discussion_r156550872 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -574,51 +539,90 @@ case

[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-12 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19932#discussion_r156546885 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -1021,8 +998,38 @@ private[hive] object HiveClientImpl

[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-12 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19932#discussion_r156546859 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -213,6 +213,27 @@ class StatisticsSuite extends

[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-09 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19594 ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-09 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19932#discussion_r155936167 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -353,15 +374,6 @@ class StatisticsSuite extends

[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-09 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19932#discussion_r155936087 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -213,6 +213,29 @@ class StatisticsSuite extends

[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-09 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19932#discussion_r155921370 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -413,32 +413,7 @@ private[hive] class HiveClientImpl

[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-09 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19932 cc @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-09 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19594 retest this please.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-09 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19594 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-09 Thread wzhfy
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/19932 [SPARK-22745][SQL] read partition stats from Hive ## What changes were proposed in this pull request? Currently Spark can read table stats (e.g. `totalSize, numRows`) from Hive, we can also

[GitHub] spark issue #19594: [SPARK-21984] [SQL] Join estimation based on equi-height...

2017-12-08 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19594 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-08 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r155910267 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala --- @@ -67,6 +68,205 @@ class

[GitHub] spark pull request #19594: [SPARK-21984] [SQL] Join estimation based on equi...

2017-12-08 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19594#discussion_r155910232 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/JoinEstimationSuite.scala --- @@ -67,6 +68,205 @@ class

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-07 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r155692778 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -359,7 +371,7 @@ class

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-07 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r155691788 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -529,6 +570,56 @@ case class

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-12-07 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r155690722 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -332,8 +332,41 @@ case class

[GitHub] spark issue #19880: [SPARK-22626][SQL][FOLLOWUP] improve documentation and s...

2017-12-04 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19880 cc @cloud-fan @wangyum --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19880: [SPARK-22626][SQL][FOLLOWUP] improve documentatio...

2017-12-04 Thread wzhfy
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/19880 [SPARK-22626][SQL][FOLLOWUP] improve documentation and simplify test case ## What changes were proposed in this pull request? The reason why some Hive tables have `numRows` statistics

[GitHub] spark pull request #19831: [SPARK-22626][SQL] Wrong Hive table statistics ma...

2017-12-02 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19831#discussion_r154499581 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -1187,6 +1187,22 @@ class HiveQuerySuite extends

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-12-02 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19831 @cloud-fan Yes, Spark doesn't allow user to set (Spark's) statistics manually. This PR treats 0 row count of **Hive's stats**, it doesn't affect the logic for Spark's stats. Besides, Spark

[GitHub] spark issue #19855: [SPARK-22662] [SQL] Failed to prune columns after rewrit...

2017-12-02 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19855 @gengliangwang @cloud-fan Previously this rule is in the batch `Operator Optimizations`, but after [SPARK-14781](https://github.com/apache/spark/pull/12820), it is moved into a separate batch

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154270654 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,197 @@ object

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-30 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19831 Since Hive can't protect user to set a wrong stats properties, I think this solution can alleviate the problem. Besides, it's consistent with what we do for `totalSize and rawDataSize` (only use

[GitHub] spark pull request #19831: [SPARK-22626][SQL] Wrong Hive table statistics ma...

2017-11-30 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19831#discussion_r154250160 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -418,7 +418,7 @@ private[hive] class HiveClientImpl

[GitHub] spark issue #19831: [SPARK-22626][SQL] Wrong Hive table statistics may trigg...

2017-11-30 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19831 > Besides, if the size stats totalSize or rawDataSize is wrong, the problem exists whether CBO is enabled or not. > If CBO enabled, the outputRowCount == 0, the getOutputSiz

[GitHub] spark pull request #19783: [SPARK-21322][SQL] support histogram in filter ca...

2017-11-30 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19783#discussion_r154248775 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala --- @@ -114,4 +114,197 @@ object

  1   2   3   4   5   6   7   8   9   10   >