[GitHub] spark pull request #15412: [SPARK-17844] Simplify DataFrame API for defining...

2016-10-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15412


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15412: [SPARK-17844] Simplify DataFrame API for defining...

2016-10-09 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/15412

[SPARK-17844] Simplify DataFrame API for defining frame boundaries in 
window functions

## What changes were proposed in this pull request?
When I was creating the example code for SPARK-10496, I realized it was 
pretty convoluted to define the frame boundaries for window functions when 
there is no partition column or ordering column. The reason is that we don't 
provide a way to create a WindowSpec directly with the frame boundaries. We can 
trivially improve this by adding rowsBetween and rangeBetween to Window object.

As an example, to compute cumulative sum, before this pr:
```
df.select('key, 
sum("value").over(Window.partitionBy(lit(1)).rowsBetween(Long.MinValue, 0)))
```

After this pr:
```
df.select('key, sum("value").over(Window.rowsBetween(Long.MinValue, 0)))
```

## How was this patch tested?
Added test cases to compute cumulative sum in DataFrameWindowSuite for 
Scala/Java and tests.py for Python.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-17844

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15412


commit 98b77a7c660e0064353b1fa98e2e47bc2d971bea
Author: Reynold Xin 
Date:   2016-10-10T01:15:15Z

[SPARK-17844] Simplify DataFrame API for defining frame boundaries in 
window functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org