[jira] [Commented] (SPARK-17893) Window functions should also allow looking back in time

2016-10-13 Thread Raviteja Lokineni (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571728#comment-15571728
 ] 

Raviteja Lokineni commented on SPARK-17893:
---

I just hope that aggregates won't be polluted by empty rows in between, I 
wonder how avg aggregation works when it encounters null.

> Window functions should also allow looking back in time
> ---
>
> Key: SPARK-17893
> URL: https://issues.apache.org/jira/browse/SPARK-17893
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.0.1
>Reporter: Raviteja Lokineni
>
> This function should allow looking back. The current window(timestamp, 
> duration) seems to be for looking forward in time.
> Example:
> {code}dataFrame.groupBy(window("date", "7 days ago")).agg(min("col1"), 
> max("col1")){code}
> For example, if date: 2013-01-07 then the window should be 2013-01-01 - 
> 2013-01-07



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17893) Window functions should also allow looking back in time

2016-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571623#comment-15571623
 ] 

Sean Owen commented on SPARK-17893:
---

Maybe:
- Aggregate by day
- Generate a DataFrame containing all days from the start to end of your data
- Outer join with that, to fill in a row for missing dates
- Use 7-row lagging window to aggregate over sliding 7-day intervals

> Window functions should also allow looking back in time
> ---
>
> Key: SPARK-17893
> URL: https://issues.apache.org/jira/browse/SPARK-17893
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.0.1
>Reporter: Raviteja Lokineni
>
> This function should allow looking back. The current window(timestamp, 
> duration) seems to be for looking forward in time.
> Example:
> {code}dataFrame.groupBy(window("date", "7 days ago")).agg(min("col1"), 
> max("col1")){code}
> For example, if date: 2013-01-07 then the window should be 2013-01-01 - 
> 2013-01-07



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17893) Window functions should also allow looking back in time

2016-10-13 Thread Raviteja Lokineni (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571608#comment-15571608
 ] 

Raviteja Lokineni commented on SPARK-17893:
---

The tricky part is we don't a record for every day and yes I am looking at 
calendar days since time wasn't recorded in our data. So is there something 
that spark can do to help my case. I am running in circles figuring out the 
solution for this scenario.

> Window functions should also allow looking back in time
> ---
>
> Key: SPARK-17893
> URL: https://issues.apache.org/jira/browse/SPARK-17893
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.0.1
>Reporter: Raviteja Lokineni
>
> This function should allow looking back. The current window(timestamp, 
> duration) seems to be for looking forward in time.
> Example:
> {code}dataFrame.groupBy(window("date", "7 days ago")).agg(min("col1"), 
> max("col1")){code}
> For example, if date: 2013-01-07 then the window should be 2013-01-01 - 
> 2013-01-07



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17893) Window functions should also allow looking back in time

2016-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571599#comment-15571599
 ] 

Sean Owen commented on SPARK-17893:
---

OK, well windows here aren't inherently including "past" or "future" events; 
they are just a set of events spanning a certain amount of time, and aren't 
relative to the time of a particular event in the window.  You can compute 
aggregates over the window as with other grouping functions. It sounds like you 
just want to use "7 days" as the window size and slide duration of "1 day". 
Each row of the resulting aggregation represents the 7 days leading up to a 
different unique day as desired.

As a side note, "1 day" here means 86,400,000 ms and not a calendar day. If you 
really need calendar days, this is probably trickier to get exactly right. You 
would probably have to first aggregate by day, and then window over 7 rows 
preceding to truly span 7 days each time. That assumes that there is some data 
for every day though.

> Window functions should also allow looking back in time
> ---
>
> Key: SPARK-17893
> URL: https://issues.apache.org/jira/browse/SPARK-17893
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.0.1
>Reporter: Raviteja Lokineni
>
> This function should allow looking back. The current window(timestamp, 
> duration) seems to be for looking forward in time.
> Example:
> {code}dataFrame.groupBy(window("date", "7 days ago")).agg(min("col1"), 
> max("col1")){code}
> For example, if date: 2013-01-07 then the window should be 2013-01-01 - 
> 2013-01-07



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17893) Window functions should also allow looking back in time

2016-10-13 Thread Raviteja Lokineni (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571582#comment-15571582
 ] 

Raviteja Lokineni commented on SPARK-17893:
---

[~srowen] No it's not actually streaming.

Let me explain my use case:
* We get data in batches
* Once the data is available it has a Date column and a few other numeric 
columns
* Now for every unique date that is available I have to look back 7 days and 
aggregate all the numeric columns

I am looking at past records not future.

> Window functions should also allow looking back in time
> ---
>
> Key: SPARK-17893
> URL: https://issues.apache.org/jira/browse/SPARK-17893
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.0.1
>Reporter: Raviteja Lokineni
>
> This function should allow looking back. The current window(timestamp, 
> duration) seems to be for looking forward in time.
> Example:
> {code}dataFrame.groupBy(window("date", "7 days ago")).agg(min("col1"), 
> max("col1")){code}
> For example, if date: 2013-01-07 then the window should be 2013-01-01 - 
> 2013-01-07



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17893) Window functions should also allow looking back in time

2016-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571570#comment-15571570
 ] 

Sean Owen commented on SPARK-17893:
---

OK, this is about streaming? the syntax lets you specify the window size, which 
inherently ends "now". It's of course not possible to defining streaming 
windows that include future times.

> Window functions should also allow looking back in time
> ---
>
> Key: SPARK-17893
> URL: https://issues.apache.org/jira/browse/SPARK-17893
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.0.1
>Reporter: Raviteja Lokineni
>
> This function should allow looking back. The current window(timestamp, 
> duration) seems to be for looking forward in time.
> Example:
> {code}dataFrame.groupBy(window("date", "7 days ago")).agg(min("col1"), 
> max("col1")){code}
> For example, if date: 2013-01-07 then the window should be 2013-01-01 - 
> 2013-01-07



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17893) Window functions should also allow looking back in time

2016-10-13 Thread Raviteja Lokineni (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571556#comment-15571556
 ] 

Raviteja Lokineni commented on SPARK-17893:
---

[~srowen] I don't find anything in the documentation which offsets by timestamp 
or date.
* 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.expressions.Window
** Offsets by number of rows ahead or behind
** In my case I do not have a fixed number of rows between dates
* 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
** I looked at all four definitions of lead/lag in the docs but it offsets only 
by number of rows, which in my case is not possible

FYI, the function that I was referring to was: {noformat}def
window(timeColumn: Column, windowDuration: String): Column{noformat}

> Window functions should also allow looking back in time
> ---
>
> Key: SPARK-17893
> URL: https://issues.apache.org/jira/browse/SPARK-17893
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.0.1
>Reporter: Raviteja Lokineni
>
> This function should allow looking back. The current window(timestamp, 
> duration) seems to be for looking forward in time.
> Example:
> {code}dataFrame.groupBy(window("date", "7 days ago")).agg(min("col1"), 
> max("col1")){code}
> For example, if date: 2013-01-07 then the window should be 2013-01-01 - 
> 2013-01-07



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org