[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170756485 Sure, let me close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/10689 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170755933 @gatorsmile I think we'd need more proper design for limits. Let's close this as later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170470751 **[Test build #49117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49117/consoleFull)** for PR 10689 at commit [`6244975`](https://github.com/apache/spark/commit/6244975948c016f5adc7dedef825d472f01c8846). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170473951 @gatorsmile the fix looks good. @rxin / @marmbrus / @gatorsmile I am not sure if we should support this at all. Using a limit in SELECT's connected by a UNION ALL is fine, but things tend to get really strange once you start using this in combination with other SET or JOIN operations; it'll get very hard to reasion about the result. Most RDMS'es do not support this. I'd rather have an optimizer rule which pushes down limit clauses whenever this is possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170507628 **[Test build #49117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49117/consoleFull)** for PR 10689 at commit [`6244975`](https://github.com/apache/spark/commit/6244975948c016f5adc7dedef825d472f01c8846). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170507968 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170507969 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49117/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170469510 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170691919 Yeah! I just read the implementation of `Limit`. As you said, the current one is not highly efficient, especially when the number of limits is not small. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170619409 Give two tables `tbl_a` and `tbl_b`, `tbl_a` has **billions** of rows but `tbl_b` has **thousands** of rows. `tbl_a` has one column `col_frkey_tbl_a` whose values should be from `tbl_b`'s column `col_key_tbl_b`. However, one user wants to do a quick check to confirm it. The query he can try is ``` select col_frkey_tbl_a from db.tbl_a limit 1 intersect select col_key_tbl_b from db.tbl_b ``` The above query can avoid fetching billions of rows from `tbl_a`. Hopefully, it can answer your question. @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170597130 @gatorsmile I do see the performance benefits of ```limit``` while processing. The reservation I am having is reasoning about non-toplevel ```limit``` statements. A set-operator example: select a from db.tbl_a intersect select b from db.tbl_b The result should all distinct rows in ```a``` for which we can find an equal tuple in ```b```. Let's add limit to this: select a from db.tbl_a limit 10 intersect select b from db.tbl_b limit 10 The result now be the first (distinct?) 10 rows from ```a``` which will be filtered by checking if they exist in the first 10 rows of ```b``` (I think). I am not sure this is what a user expects, further more: - You will probably end up with less then 10 rows here. - The results will be probably non-deterministic (unless you would also allow somekind of ordering in a subquery). Do you have a concrete realworld example where you need this? I don't really mind if we would put this back in the parser (the engine supports it anyway). But I don't think we should just do something like this without some consideration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170589671 @hvanhovell Let me share my two cents: - We have another PR to push down `Limit` through `Union ALL`. However, it is impossible to push `Limit` through `Union Distinct`: https://github.com/apache/spark/pull/10451 - If we want to convert a logical plan back a SQL (in https://github.com/apache/spark/pull/10541), we need to support it, I think. @liancheng Please correct me, if my understanding is wrong. - `Limit` is a super critical when the scale is huge. Our `Dataframe` API can add this almost everywhere. In the long term, we should provide the same functions for all the different interfaces, I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170634356 **[Test build #49158 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49158/consoleFull)** for PR 10689 at commit [`94386aa`](https://github.com/apache/spark/commit/94386aa0fb392c51aa0862bc208cff63614b3c62). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170634811 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49158/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170634799 **[Test build #49158 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49158/consoleFull)** for PR 10689 at commit [`94386aa`](https://github.com/apache/spark/commit/94386aa0fb392c51aa0862bc208cff63614b3c62). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170634808 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170638417 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49159/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170638416 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170638728 **[Test build #49160 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49160/consoleFull)** for PR 10689 at commit [`b9ba021`](https://github.com/apache/spark/commit/b9ba021cd276a3c53dbc83f7af3046dab5f09706). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170670978 **[Test build #49160 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49160/consoleFull)** for PR 10689 at commit [`b9ba021`](https://github.com/apache/spark/commit/b9ba021cd276a3c53dbc83f7af3046dab5f09706). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170671485 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49160/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170671482 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170673019 That example seems kind of artificial to me. Additionally large non-terminal limits are not planned very well today so I think users are going to be surprised. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10689#discussion_r49290681 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystQlSuite.scala --- @@ -49,4 +49,11 @@ class CatalystQlSuite extends PlanTest { parser.createPlan("select sum(product + 1) over (partition by (product + (1)) order by 2) " + "from windowData") } + + test("limit clause: a support in set operation") { +parser.createPlan("select key from (select * from t1) x limit 1") +parser.createPlan("select key from (select * from t1 limit 2) x limit 1") +parser.createPlan("select key from ((select * from testData limit 1) " + --- End diff -- Sure, will add such a test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170422086 @hvanhovell @rxin Could you take a look? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/10689#discussion_r49288754 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystQlSuite.scala --- @@ -49,4 +49,11 @@ class CatalystQlSuite extends PlanTest { parser.createPlan("select sum(product + 1) over (partition by (product + (1)) order by 2) " + "from windowData") } + + test("limit clause: a support in set operation") { +parser.createPlan("select key from (select * from t1) x limit 1") +parser.createPlan("select key from (select * from t1 limit 2) x limit 1") +parser.createPlan("select key from ((select * from testData limit 1) " + --- End diff -- should we test there is a limit being injected? otherwise the parser could've just ignored the clause. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170421465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49080/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170421463 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170421391 **[Test build #49080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49080/consoleFull)** for PR 10689 at commit [`310cb32`](https://github.com/apache/spark/commit/310cb323ae969792dee32cbe320448a06c9c1cca). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/10689#discussion_r49292062 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystQlSuite.scala --- @@ -49,4 +50,16 @@ class CatalystQlSuite extends PlanTest { parser.createPlan("select sum(product + 1) over (partition by (product + (1)) order by 2) " + "from windowData") } + + test("limit clause: a support in set operation") { +val plan1 = parser.createPlan("select key from (select * from t1) x limit 1") +assert(plan1.collect{ case w: Limit => w }.size === 1) + +val plan2 = parser.createPlan("select key from (select * from t1 limit 2) x limit 1") +assert(plan2.collect{ case w: Limit => w }.size === 2) + +val plan3 = parser.createPlan("select key from ((select * from testData limit 1) " + + "union all (select * from testData limit 1)) x limit 1") +assert(plan3.collect{ case w: Limit => w }.size === 3) --- End diff -- can we do this similar to how we compare plans in various optimizer suites, e.g. https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeInSuite.scala#L84 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10689#discussion_r49292180 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystQlSuite.scala --- @@ -49,4 +50,16 @@ class CatalystQlSuite extends PlanTest { parser.createPlan("select sum(product + 1) over (partition by (product + (1)) order by 2) " + "from windowData") } + + test("limit clause: a support in set operation") { +val plan1 = parser.createPlan("select key from (select * from t1) x limit 1") +assert(plan1.collect{ case w: Limit => w }.size === 1) + +val plan2 = parser.createPlan("select key from (select * from t1 limit 2) x limit 1") +assert(plan2.collect{ case w: Limit => w }.size === 2) + +val plan3 = parser.createPlan("select key from ((select * from testData limit 1) " + + "union all (select * from testData limit 1)) x limit 1") +assert(plan3.collect{ case w: Limit => w }.size === 3) --- End diff -- Sure, it sounds better. Will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170442349 **[Test build #49094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49094/consoleFull)** for PR 10689 at commit [`6244975`](https://github.com/apache/spark/commit/6244975948c016f5adc7dedef825d472f01c8846). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170449790 **[Test build #49094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49094/consoleFull)** for PR 10689 at commit [`6244975`](https://github.com/apache/spark/commit/6244975948c016f5adc7dedef825d472f01c8846). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170449852 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170449853 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49094/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/10689 [SPARK-12745] [SQL] Hive Parser: Limit is not supported inside Set Operation The current SQLContext allows the following query, which is copied from a test case in SQLQuerySuite: ``` checkAnswer(sql( """ |select key from ((select * from testData limit 1) | union all (select * from testData limit 1)) x limit 1 """.stripMargin), Row(1) ) ``` However, it is rejected in the Hive parser. This PR is to make Hive parser support the Limit Clause inside Set Operator. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark limitInUnion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10689.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10689 commit 428160fd824309f83127ad68efabd0595d614abd Author: gatorsmileDate: 2016-01-11T01:12:48Z The Limit Clause can be applied inside the set operation commit 310cb323ae969792dee32cbe320448a06c9c1cca Author: gatorsmile Date: 2016-01-11T01:14:13Z Merge remote-tracking branch 'upstream/master' into limitInUnion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10689#issuecomment-170413432 **[Test build #49080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49080/consoleFull)** for PR 10689 at commit [`310cb32`](https://github.com/apache/spark/commit/310cb323ae969792dee32cbe320448a06c9c1cca). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org