GitHub user yhuai opened a pull request:
https://github.com/apache/spark/pull/14284
[SPARK-16633] [SPARK-16642] Fixes three issues related to window functions
## What changes were proposed in this pull request?
This PR contains three changes.
First, this PR changes the behavior of lead/lag back to Spark 1.6's
behavior, which is described as below:
1. lead/lag respect null input values, which means that if the offset row
exists and the input value is null, the result will be null instead of the
default value.
2. If the offset row does not exist, the default value will be used.
3. OffsetWindowFunction's nullable setting also considers the nullability
of its input (because of the first change).
Second, this PR fixes the evaluation of lead/lag when the input expression
is a literal. This fix is a result of the first change. In current master, if a
literal is used as the input expression of a lead or lag function, the result
will be this literal even if the offset row does not exist.
Third, this PR makes ResolveWindowFrame not fire if a window function is
not resolved.
## How was this patch tested?
New tests in SQLWindowFunctionSuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yhuai/spark lead-lag
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14284.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14284
----
commit 78e69018ecaffb9598f4ea2b51900850ee3fb988
Author: Yin Huai <[email protected]>
Date: 2016-07-20T06:56:50Z
Add regression tests
commit da5f36f5daa16c4aba605cb939b313c92274b24e
Author: Yin Huai <[email protected]>
Date: 2016-07-20T07:22:17Z
Fix SPARK-16642
commit 02ee1915ab2519c876f60162ff00aaa155142eec
Author: Yin Huai <[email protected]>
Date: 2016-07-20T08:43:04Z
OffsetWindowFunction's nullable should also check its input's nullable
field.
commit 506393b3eec45f7b62615adfe317a230e8de4128
Author: Yin Huai <[email protected]>
Date: 2016-07-20T08:43:28Z
Change the behavior of lead/lag back to Spark 1.6's behavior, which is
explained below:
* When the offset row does not exits, default values will be used.
* lead/lag always respect null input values.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]