[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r72144974 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WindowExec.scala --- @@ -625,10 +643,12 @@ private[execution] final class OffsetWindowFunctionFrame( if (inputIndex >= 0 && inputIndex < input.size) { --- End diff -- Ok, lets improve this in a follow-up PR :). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r72083334 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WindowExec.scala --- @@ -625,10 +643,12 @@ private[execution] final class OffsetWindowFunctionFrame( if (inputIndex >= 0 && inputIndex < input.size) { --- End diff -- Yea, we can improve this part in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r72083182 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala --- @@ -357,14 +356,59 @@ class SQLWindowFunctionSuite extends QueryTest with SQLTestUtils with TestHiveSi } test("SPARK-7595: Window will cause resolve failed with self join") { -sql("SELECT * FROM src") // Force loading of src table. --- End diff -- it is not in Hive. So there is no table called `src`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71993975 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WindowExec.scala --- @@ -625,10 +643,12 @@ private[execution] final class OffsetWindowFunctionFrame( if (inputIndex >= 0 && inputIndex < input.size) { --- End diff -- This is a more general comment, which does not necessarily apply to this line. Since we are breaking the code up into to separate code paths (with row/without row), we might as well get rid of the joined row and the logic needed to set this up (like: `Seq.fill(ordinal)(NoOp)`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71990056 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala --- @@ -357,14 +356,59 @@ class SQLWindowFunctionSuite extends QueryTest with SQLTestUtils with TestHiveSi } test("SPARK-7595: Window will cause resolve failed with self join") { -sql("SELECT * FROM src") // Force loading of src table. --- End diff -- but why we remove it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71782889 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala --- @@ -357,14 +356,59 @@ class SQLWindowFunctionSuite extends QueryTest with SQLTestUtils with TestHiveSi } test("SPARK-7595: Window will cause resolve failed with self join") { -sql("SELECT * FROM src") // Force loading of src table. - checkAnswer(sql( """ |with -| v1 as (select key, count(value) over (partition by key) cnt_val from src), +| v0 as (select 0 as key, 1 as value), +| v1 as (select key, count(value) over (partition by key) cnt_val from v0), | v2 as (select v1.key, v1_lag.cnt_val from v1, v1 v1_lag where v1.key = v1_lag.key) -| select * from v2 order by key limit 1 - """.stripMargin), Row(0, 3)) +| select key, cnt_val from v2 order by key limit 1 + """.stripMargin), Row(0, 1)) + } + + test("lead/lag should return the default value if the offset row does not exist") { +checkAnswer(sql( + """ +|SELECT +| lag(123, 100, 321) OVER (ORDER BY id) as lag, +| lead(123, 100, 321) OVER (ORDER BY id) as lead +|FROM (SELECT 1 as id) tmp + """.stripMargin), + Row(321, 321)) + +checkAnswer(sql( + """ +|SELECT +| lag(123, 100, a) OVER (ORDER BY id) as lag, +| lead(123, 100, a) OVER (ORDER BY id) as lead +|FROM (SELECT 1 as id, 2 as a) tmp + """.stripMargin), + Row(2, 2)) + } + + test("lead/lag should be able to handle null input value correctly") { --- End diff -- ok I see what @jaceklaskowski meant. I thought he was questioning the behavior of lead/lag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71781735 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala --- @@ -357,14 +356,59 @@ class SQLWindowFunctionSuite extends QueryTest with SQLTestUtils with TestHiveSi } test("SPARK-7595: Window will cause resolve failed with self join") { -sql("SELECT * FROM src") // Force loading of src table. - checkAnswer(sql( """ |with -| v1 as (select key, count(value) over (partition by key) cnt_val from src), +| v0 as (select 0 as key, 1 as value), +| v1 as (select key, count(value) over (partition by key) cnt_val from v0), | v2 as (select v1.key, v1_lag.cnt_val from v1, v1 v1_lag where v1.key = v1_lag.key) -| select * from v2 order by key limit 1 - """.stripMargin), Row(0, 3)) +| select key, cnt_val from v2 order by key limit 1 + """.stripMargin), Row(0, 1)) + } + + test("lead/lag should return the default value if the offset row does not exist") { +checkAnswer(sql( + """ +|SELECT +| lag(123, 100, 321) OVER (ORDER BY id) as lag, +| lead(123, 100, 321) OVER (ORDER BY id) as lead +|FROM (SELECT 1 as id) tmp + """.stripMargin), + Row(321, 321)) + +checkAnswer(sql( + """ +|SELECT +| lag(123, 100, a) OVER (ORDER BY id) as lag, +| lead(123, 100, a) OVER (ORDER BY id) as lead +|FROM (SELECT 1 as id, 2 as a) tmp + """.stripMargin), + Row(2, 2)) + } + + test("lead/lag should be able to handle null input value correctly") { --- End diff -- It's best if the test case name should specify the behavior, rather than saying "correctly". Since obviously we don't want anything to be "incorrectly" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71781548 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala --- @@ -357,14 +356,59 @@ class SQLWindowFunctionSuite extends QueryTest with SQLTestUtils with TestHiveSi } test("SPARK-7595: Window will cause resolve failed with self join") { -sql("SELECT * FROM src") // Force loading of src table. - checkAnswer(sql( """ |with -| v1 as (select key, count(value) over (partition by key) cnt_val from src), +| v0 as (select 0 as key, 1 as value), +| v1 as (select key, count(value) over (partition by key) cnt_val from v0), | v2 as (select v1.key, v1_lag.cnt_val from v1, v1 v1_lag where v1.key = v1_lag.key) -| select * from v2 order by key limit 1 - """.stripMargin), Row(0, 3)) +| select key, cnt_val from v2 order by key limit 1 + """.stripMargin), Row(0, 1)) + } + + test("lead/lag should return the default value if the offset row does not exist") { +checkAnswer(sql( + """ +|SELECT +| lag(123, 100, 321) OVER (ORDER BY id) as lag, +| lead(123, 100, 321) OVER (ORDER BY id) as lead +|FROM (SELECT 1 as id) tmp + """.stripMargin), + Row(321, 321)) + +checkAnswer(sql( + """ +|SELECT +| lag(123, 100, a) OVER (ORDER BY id) as lag, +| lead(123, 100, a) OVER (ORDER BY id) as lead +|FROM (SELECT 1 as id, 2 as a) tmp + """.stripMargin), + Row(2, 2)) + } + + test("lead/lag should be able to handle null input value correctly") { --- End diff -- Can you explain? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71781038 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala --- @@ -357,14 +356,59 @@ class SQLWindowFunctionSuite extends QueryTest with SQLTestUtils with TestHiveSi } test("SPARK-7595: Window will cause resolve failed with self join") { -sql("SELECT * FROM src") // Force loading of src table. - checkAnswer(sql( """ |with -| v1 as (select key, count(value) over (partition by key) cnt_val from src), +| v0 as (select 0 as key, 1 as value), +| v1 as (select key, count(value) over (partition by key) cnt_val from v0), | v2 as (select v1.key, v1_lag.cnt_val from v1, v1 v1_lag where v1.key = v1_lag.key) -| select * from v2 order by key limit 1 - """.stripMargin), Row(0, 3)) +| select key, cnt_val from v2 order by key limit 1 + """.stripMargin), Row(0, 1)) + } + + test("lead/lag should return the default value if the offset row does not exist") { +checkAnswer(sql( + """ +|SELECT +| lag(123, 100, 321) OVER (ORDER BY id) as lag, +| lead(123, 100, 321) OVER (ORDER BY id) as lead +|FROM (SELECT 1 as id) tmp + """.stripMargin), + Row(321, 321)) + +checkAnswer(sql( + """ +|SELECT +| lag(123, 100, a) OVER (ORDER BY id) as lag, +| lead(123, 100, a) OVER (ORDER BY id) as lead +|FROM (SELECT 1 as id, 2 as a) tmp + """.stripMargin), + Row(2, 2)) + } + + test("lead/lag should be able to handle null input value correctly") { --- End diff -- I don't think "correctly" is needed here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71598723 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLWindowFunctionSuite.scala --- @@ -367,4 +367,50 @@ class SQLWindowFunctionSuite extends QueryTest with SQLTestUtils with TestHiveSi | select * from v2 order by key limit 1 """.stripMargin), Row(0, 3)) } + + test("lead/lag should return the default value if the offset row does not exist") { --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71598678 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala --- @@ -357,14 +356,59 @@ class SQLWindowFunctionSuite extends QueryTest with SQLTestUtils with TestHiveSi } test("SPARK-7595: Window will cause resolve failed with self join") { -sql("SELECT * FROM src") // Force loading of src table. --- End diff -- For this test, I disabled the fix (https://github.com/apache/spark/pull/6114/files) and checked that it does fail the analysis because analyzer fails to resolve conflicting references in Join. So, this test is still valid after my change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71588935 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLWindowFunctionSuite.scala --- @@ -367,4 +367,50 @@ class SQLWindowFunctionSuite extends QueryTest with SQLTestUtils with TestHiveSi | select * from v2 order by key limit 1 """.stripMargin), Row(0, 3)) } + + test("lead/lag should return the default value if the offset row does not exist") { --- End diff -- oh, originally we used window functions from Hive. It should be reason that we put this file at here. Let me move it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71584966 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLWindowFunctionSuite.scala --- @@ -367,4 +367,50 @@ class SQLWindowFunctionSuite extends QueryTest with SQLTestUtils with TestHiveSi | select * from v2 order by key limit 1 """.stripMargin), Row(0, 3)) } + + test("lead/lag should return the default value if the offset row does not exist") { --- End diff -- hm why is this file in hive? can you move it in a separate pr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71489063 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WindowExec.scala --- @@ -582,25 +582,43 @@ private[execution] final class OffsetWindowFunctionFrame( /** Row used to combine the offset and the current row. */ private[this] val join = new JoinedRow - /** Create the projection. */ + /** + * Create the projection used when the offset row exists. + * Please note that this project always respect null input values (like PostgreSQL). + */ private[this] val projection = { --- End diff -- If we want to keep the behavioral change, we can make this the same as the original `project` and revert https://github.com/apache/spark/pull/14284/files#diff-4a8f00ca33a80744965463dcc6662c75R351. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14284#discussion_r71488537 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -382,7 +382,7 @@ abstract class OffsetWindowFunction * * @param input expression to evaluate 'offset' rows after the current row. * @param offset rows to jump ahead in the partition. - * @param default to use when the input value is null or when the offset is larger than the window. + * @param default to use when the offset is larger than the window. --- End diff -- @hvanhovell what was the reason that we changed the behavior of lead and lag on if they respect null values? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] Fixes three issues re...
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/14284 [SPARK-16633] [SPARK-16642] Fixes three issues related to window functions ## What changes were proposed in this pull request? This PR contains three changes. First, this PR changes the behavior of lead/lag back to Spark 1.6's behavior, which is described as below: 1. lead/lag respect null input values, which means that if the offset row exists and the input value is null, the result will be null instead of the default value. 2. If the offset row does not exist, the default value will be used. 3. OffsetWindowFunction's nullable setting also considers the nullability of its input (because of the first change). Second, this PR fixes the evaluation of lead/lag when the input expression is a literal. This fix is a result of the first change. In current master, if a literal is used as the input expression of a lead or lag function, the result will be this literal even if the offset row does not exist. Third, this PR makes ResolveWindowFrame not fire if a window function is not resolved. ## How was this patch tested? New tests in SQLWindowFunctionSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/yhuai/spark lead-lag Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14284.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14284 commit 78e69018ecaffb9598f4ea2b51900850ee3fb988 Author: Yin HuaiDate: 2016-07-20T06:56:50Z Add regression tests commit da5f36f5daa16c4aba605cb939b313c92274b24e Author: Yin Huai Date: 2016-07-20T07:22:17Z Fix SPARK-16642 commit 02ee1915ab2519c876f60162ff00aaa155142eec Author: Yin Huai Date: 2016-07-20T08:43:04Z OffsetWindowFunction's nullable should also check its input's nullable field. commit 506393b3eec45f7b62615adfe317a230e8de4128 Author: Yin Huai Date: 2016-07-20T08:43:28Z Change the behavior of lead/lag back to Spark 1.6's behavior, which is explained below: * When the offset row does not exits, default values will be used. * lead/lag always respect null input values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org