[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44267784 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -146,148 +146,105 @@ private[sql] abstract class

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44268074 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -146,148 +146,105 @@ private[sql] abstract class

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44268156 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala --- @@ -21,22 +21,22 @@ import

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44268135 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregate.scala --- @@ -27,15 +27,15 @@ import

[GitHub] spark pull request: [SPARK-11450][SQL] Add Unsafe Row processing t...

2015-11-02 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/9414 [SPARK-11450][SQL] Add Unsafe Row processing to Expand This PR enables the Expand operator to process and produce Unsafe Rows. You can merge this pull request into a Git repository by running

[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-02 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/9417 [SPARK-11449][Core] PortableDataStream should be a factory ```PortableDataStream``` maintains some internal state. This makes it tricky to reuse a stream (one needs to call ```close``` on both

[GitHub] spark pull request: [SPARK-11275][SQL] Reimplement Expand as a Gen...

2015-11-03 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9429#discussion_r43724216 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -205,45 +205,30 @@ class Analyzer

[GitHub] spark pull request: [SPARK-11275][SQL][WIP] Rollup and Cube Genera...

2015-11-02 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9419#issuecomment-153147431 It would also help alot to have unit tests covering this problem. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-11275][SQL][WIP] Rollup and Cube Genera...

2015-11-02 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9419#issuecomment-153146768 If I understand the problem correctly, the logical Expand operator makes items which are not in the grouping set ```null```. This means that if a column is both used

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9406#discussion_r44203475 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Utils.scala --- @@ -213,3 +216,178 @@ object Utils

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9409#discussion_r44203342 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -419,3 +419,30 @@ case class Greatest

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9409#discussion_r44203267 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -419,3 +419,30 @@ case class Greatest

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9406#discussion_r44203629 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Utils.scala --- @@ -213,3 +216,178 @@ object Utils

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9406#discussion_r44203639 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Utils.scala --- @@ -213,3 +216,178 @@ object Utils

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-07 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/9541 [SPARK-9241][SQL] Supporting multiple DISTINCT columns - follow-up This PR is a follow up for PR https://github.com/apache/spark/pull/9406. It adds more documentation to the rewriting rule

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-07 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154695463 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-07 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9409#discussion_r44210935 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -419,3 +419,30 @@ case class Greatest

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-07 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9409#discussion_r44216923 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Utils.scala --- @@ -54,10 +54,14 @@ object Utils

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-07 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154752963 This one is currently running: https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2005/consoleFull --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-07 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154750495 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-07 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9409#discussion_r44214255 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Utils.scala --- @@ -54,10 +54,14 @@ object Utils

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-07 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9541#issuecomment-154723430 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-07 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9541#issuecomment-154723404 Funny build failure: Build was aborted Aborted by anonymous ERROR: Step ?Archive the artifacts? failed: no workspace

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-07 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9409#discussion_r44213709 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Utils.scala --- @@ -54,10 +54,14 @@ object Utils

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-07 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9409#issuecomment-154758153 Seems like I have broken something. I'll need to rebase anyway. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-04 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9417#discussion_r43870851 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -177,39 +170,24 @@ class PortableDataStream

[GitHub] spark pull request: [SPARK-11495] Fix potential socket / file hand...

2015-11-04 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9455#discussion_r43854421 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -325,6 +327,11 @@ public Location next() { try

[GitHub] spark pull request: [SPARK-11474]Options to jdbc load are lower ca...

2015-11-04 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9461#discussion_r43852227 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala --- @@ -487,4 +488,9 @@ private[sql] class JDBCRDD

[GitHub] spark pull request: Update LDAOptimizer.scala

2015-11-04 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9456#issuecomment-153630052 Could you add the number of the JIRA ticket this relates to? See other PRs for an example. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-02 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/9406 [SPARK-9241][SQL] Supporting multiple DISTINCT columns (2) - Rewriting Rule The second PR for SPARK-9241, this adds support for multiple distinct columns to the new aggregation code path

[GitHub] spark pull request: [SPARK-10100] [SQL] Perfomance improvements to...

2015-11-02 Thread hvanhovell
Github user hvanhovell closed the pull request at: https://github.com/apache/spark/pull/8298 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-11451][SQL] Support single distinct cou...

2015-11-02 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/9409 [SPARK-11451][SQL] Support single distinct count on multiple columns. This PR adds support for multiple column in a single count distinct aggregate to the new aggregation path. cc

[GitHub] spark pull request: [SPARK-11450][SQL] Add Unsafe Row processing t...

2015-11-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9414#discussion_r44112686 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala --- @@ -41,14 +41,34 @@ case class Expand( // as UNKNOWN partitioning

[GitHub] spark pull request: [SPARK-11450][SQL] Add Unsafe Row processing t...

2015-11-06 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9414#discussion_r44112770 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/ExpandSuite.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-9241] [SQL] [WIP] Supporting multiple D...

2015-11-06 Thread hvanhovell
Github user hvanhovell closed the pull request at: https://github.com/apache/spark/pull/9280 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-06 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9406#issuecomment-154368700 H... this is a bit of a strange error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-06 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9406#issuecomment-154368715 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-9241][SQL] Supporting multiple DISTINCT...

2015-11-06 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9406#issuecomment-154369474 Jenkins is not retesting... @marmbrus could you add me to the whitelist? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

2015-10-15 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-148305817 Another thought on hashing. The ClearSpring hash is a generic hash function. We could used very specialized (hopefully fast) hashing functions, because we know

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

2015-10-14 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-148209375 @yhuai It doesn't. A 64-bit hashcode is recommended though, especially when would want to approximate a billion or more unique values. I have used the ClearSpring

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

2015-10-14 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-148209543 A good article on HLL++ and the hashcode: http://research.neustar.biz/2013/01/24/hyperloglog-googles-take-on-engineering-hll --- If your project is set up

[GitHub] spark pull request: SPARK-11179: Push filters through aggregate if...

2015-10-20 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9167#issuecomment-149464143 We could do a similar thing for window functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-11009] [SQL] fix wrong result of Window...

2015-10-10 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9050#issuecomment-147077121 Good catch! Shouldn't we also backport this one into the 1.5 branch? @davies @yhuai could one of you guys explain to me why/where this is causing problems? I

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34182012 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -37,443 +67,615 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34182441 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -37,443 +67,615 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34169654 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -37,443 +59,622 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34175570 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -37,443 +67,615 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34180206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -37,443 +67,615 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34180117 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -37,443 +67,615 @@ case class Window( child: SparkPlan

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/7057#discussion_r34205943 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDataFrameWindowSuite.scala --- @@ -189,7 +189,7 @@ class HiveDataFrameWindowSuite extends

[GitHub] spark pull request: [SPARK-8638] [SQL] Window Function Performance...

2015-07-08 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/7057#issuecomment-119752352 @yhuai I have updated the PR. As for the documentation. I will add another section to the general class documentation, which explains the inner workings

[GitHub] spark pull request: [SPARK-11553] [SQL] Primitive Row accessors sh...

2015-11-14 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9642#issuecomment-156721678 So I have been making a lot of fuss about internal classes, which you are not touching. Sorry about that. This change is much more benign, but I still wonder

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

2015-08-27 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-135619245 Implemented initial non-sparse HLL++. I am going to take a look at the sparse version next week. The results are still equal to the Clearspring HLL+ implementation

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

2015-09-08 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-138493630 @rxin the dense version of HLL++ is ready. We could also add this, and add the sparse logic in a follow-up PR. Let me know what you think. I'll close if you'd rather

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

2015-09-30 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/8362#discussion_r40782166 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala --- @@ -0,0 +1,125

[GitHub] spark pull request: [SPARK-6763][SQL] Add CountMinSketch to DataFr...

2015-09-22 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/6416#issuecomment-142309139 @MLnick I guess it depends. The other ```dataframe.stat``` functions have not been implemented as UDAFs, so this is not nessecary. However I do think that CMS

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

2015-09-22 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-142323666 @MLnick I am in the process of moving house, so I am a bit slow/late with my response :(... I think it is very usefull to be able to return the HLL

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-11-29 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9819#issuecomment-160478413 Yes. You can use any Spark aggregate function as a window function. Most Hive UDAFs should also work except for the pivoted ones... --- If your project is set up

[GitHub] spark pull request: [SPARK-11949][SQL] Check bitmasks to set nulla...

2015-12-01 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10067#discussion_r46353257 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -223,10 +223,13 @@ class Analyzer

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-03 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9819#issuecomment-161653994 @zzcclp Just to absolutely (painfully) clear: You can use the Hive based window functions without a Hive installation, you just need to have a version of Spark

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47071980 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -328,3 +281,222 @@ object

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47073846 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -328,3 +281,222 @@ object

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47073901 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -328,3 +281,222 @@ object

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47072914 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -592,11 +594,17 @@ class Analyzer

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47035205 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -70,15 +70,32 @@ trait CheckAnalysis

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47034537 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -870,26 +878,37 @@ class Analyzer

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47034735 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -156,36 +165,90 @@ case class Window( * @param frame

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47034276 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -567,6 +567,8 @@ class Analyzer( case u

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47105986 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -870,26 +878,37 @@ class Analyzer

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-02 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9819#issuecomment-161247041 @zzcclp this PR is slated for review in the next week or so. It should be in good shape, but I'll leave the verdict to the reviewers. SPARK 1.6 is currently

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-10 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-163764840 @davies don't get me wrong. I think this PR is an improvement of the current situation (it never crossed my mind to change partitioning when I was working

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47408969 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,244 @@ object

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47407972 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,244 @@ object

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47410151 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -736,15 +691,156 @@ private[execution] final class

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47407165 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -70,15 +70,32 @@ trait CheckAnalysis

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47408697 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -870,26 +878,37 @@ class Analyzer

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47411273 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -736,15 +691,156 @@ private[execution] final class

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47408553 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala --- @@ -187,7 +184,7 @@ sealed abstract class

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47407885 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -120,6 +121,19 @@ sealed trait

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47410438 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Window.scala --- @@ -149,43 +152,102 @@ case class Window

[GitHub] spark pull request: [Spark-12374][SPARK-12150][SQL] Adding logical...

2015-12-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10335#discussion_r47834244 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -210,6 +210,37 @@ case class Sort

[GitHub] spark pull request: [Spark-12374][SPARK-12150][SQL] Adding logical...

2015-12-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10335#discussion_r47828090 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -126,6 +127,69 @@ case class Sample

[GitHub] spark pull request: [Spark-12374][SPARK-12150][SQL] Adding logical...

2015-12-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10335#discussion_r47828506 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -210,6 +210,37 @@ case class Sort

[GitHub] spark pull request: [Spark-12374][SPARK-12150][SQL] Adding logical...

2015-12-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10335#discussion_r47827840 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -126,6 +127,69 @@ case class Sample

[GitHub] spark pull request: [Spark-12374][SPARK-12150][SQL] Adding logical...

2015-12-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10335#discussion_r47834560 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -126,6 +127,69 @@ case class Sample

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-164251993 @yhuai I think having the two clearly separated paths (this PR) is an improvement of the current situation. I also admit that I am responsible for introducing

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-15 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47620489 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala --- @@ -472,7 +475,7 @@ class

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10228#discussion_r47444204 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala --- @@ -165,237 +137,100 @@ abstract class

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10228#discussion_r47444202 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala --- @@ -165,237 +137,100 @@ abstract class

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47450253 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47450230 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-13 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47451483 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,238 @@ object

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47432910 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -870,26 +878,37 @@ class Analyzer

[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

2015-12-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10374#discussion_r48015896 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala --- @@ -201,7 +201,7 @@ class GenericRow(protected[sql] val

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-17 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47973756 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -246,85 +260,281 @@ object

[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

2015-12-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10374#discussion_r48020875 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala --- @@ -201,7 +201,7 @@ class GenericRow(protected[sql] val

[GitHub] spark pull request: [SPARK-12258] [SQL] Hive Timestamp UDF is bind...

2015-12-10 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10249#issuecomment-163685844 The timestamp is bound to this specific number because CodeGen uses -1L as its default (null) value for Timestamp (assuming that your timezone is GMT-8

[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...

2015-12-10 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-163699118 We could move the planning of a distinct queries entirely to the DistinctAggregateRewriter. This would require us to merge the non-distinct aggregate paths

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-14 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9819#discussion_r47565984 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala --- @@ -472,7 +475,7 @@ class

[GitHub] spark pull request: [SPARK-8641][SQL] Native Spark Window function...

2015-12-14 Thread hvanhovell
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9819#issuecomment-164583021 Build failed due to R versioning problem. I'll try again when this is sorted out. --- If your project is set up for it, you can reply to this email and have your

<    1   2   3   4   5   6   7   8   9   10   >