[GitHub] spark pull request #16093: [SPARK-18663][SQL] Simplify CountMinSketch aggreg...

2016-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16093


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16093: [SPARK-18663][SQL] Simplify CountMinSketch aggreg...

2016-12-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16093#discussion_r90400227
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountMinSketchAgg.scala
 ---
@@ -75,13 +73,13 @@ case class CountMinSketchAgg(
 } else if (!epsExpression.foldable || !confidenceExpression.foldable ||
   !seedExpression.foldable) {
   TypeCheckFailure(
-"The eps, confidence or seed provided must be a literal or 
constant foldable")
+"The eps, confidence or seed provided must be a literal or 
foldable")
--- End diff --

literal is also foldable, I think we can just way `foldable`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16093: [SPARK-18663][SQL] Simplify CountMinSketch aggreg...

2016-11-30 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/16093

[SPARK-18663][SQL] Simplify CountMinSketch aggregate implementation

## What changes were proposed in this pull request?
SPARK-18429 introduced count-min sketch aggregate function for SQL, but the 
implementation and testing is more complicated than needed. This simplifies the 
test cases and removes support for data types that don't have clear equality 
semantics:

1. Removed support for floating point and decimal types.

2. Removed the heavy randomized tests. The underlying CountMinSketch 
implementation already had pretty good test coverage through randomized tests, 
and the SPARK-18429 implementation is just to add an aggregate function wrapper 
around CountMinSketch. There is no need for randomized tests at three different 
levels of the implementations.

## How was this patch tested?
A lot of the change is to simplify test cases.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-18663

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16093.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16093


commit e21479eb235a0048631c31c3e7391258e9d8d83d
Author: Reynold Xin 
Date:   2016-12-01T02:27:44Z

[SPARK-18663][SQL] Simplify CountMinSketch aggregate implementation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org