[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16631


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16631#discussion_r97017980
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -344,7 +344,8 @@ abstract class UnaryNode extends LogicalPlan {
   sizeInBytes = 1
 }
 
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+// Don't propagate rowCount and attributeStats, since they are not 
estimated here.
--- End diff --

Sure. Please submit the PR to fix the other cases. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16631#discussion_r97017902
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -344,7 +344,8 @@ abstract class UnaryNode extends LogicalPlan {
   sizeInBytes = 1
 }
 
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+// Don't propagate rowCount and attributeStats, since they are not 
estimated here.
--- End diff --

If we remove this, estimation result of aggregate still has wrong rowCount 
and attributeStats.
Shall we merge this and I'll do tests for other unaryNodes and fix them if 
something still goes wrong.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16631#discussion_r97013889
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -344,7 +344,8 @@ abstract class UnaryNode extends LogicalPlan {
   sizeInBytes = 1
 }
 
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+// Don't propagate rowCount and attributeStats, since they are not 
estimated here.
--- End diff --

How about removing this and fix all the similar issues in a separate PR? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16631#discussion_r97009203
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -344,7 +344,8 @@ abstract class UnaryNode extends LogicalPlan {
   sizeInBytes = 1
 }
 
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+// Don't propagate rowCount and attributeStats, since they are not 
estimated here.
--- End diff --

Yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16631#discussion_r97008849
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -344,7 +344,8 @@ abstract class UnaryNode extends LogicalPlan {
   sizeInBytes = 1
 }
 
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+// Don't propagate rowCount and attributeStats, since they are not 
estimated here.
--- End diff --

This sounds a general bug. We are having multiple `UnaryNode` are doing the 
same thing. Is my understanding right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-18 Thread wzhfy
GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/16631

[SPARK-19271] [SQL] Change non-cbo estimation of aggregate

## What changes were proposed in this pull request?

Change non-cbo estimation behavior of aggregate:
If groupExpression is empty, we can know row count (=1) and the 
corresponding size;
otherwise, estimation falls back to UnaryNode's computeStats method, which 
should not propagate rowCount and attributeStats in Statistics because they are 
not estimated in that method.

## How was this patch tested?

Added test case

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark aggNoCbo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16631


commit 0c64316bcd638e9b1a3ba5c990c669be6231c3c4
Author: wangzhenhua 
Date:   2017-01-18T08:01:58Z

non cbo estimation for agg




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org