[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170756485
  
Sure, let me close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/10689


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170755933
  
@gatorsmile I think we'd need more proper design for limits. Let's close 
this as later.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170470751
  
**[Test build #49117 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49117/consoleFull)**
 for PR 10689 at commit 
[`6244975`](https://github.com/apache/spark/commit/6244975948c016f5adc7dedef825d472f01c8846).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170473951
  
@gatorsmile the fix looks good.

@rxin / @marmbrus / @gatorsmile I am not sure if we should support this at 
all. Using a limit in SELECT's connected by a UNION ALL is fine, but things 
tend to get really strange once you start using this in combination with other 
SET or JOIN operations; it'll get very hard to reasion about the result. Most 
RDMS'es do not support this. I'd rather have an optimizer rule which pushes 
down limit clauses whenever this is possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170507628
  
**[Test build #49117 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49117/consoleFull)**
 for PR 10689 at commit 
[`6244975`](https://github.com/apache/spark/commit/6244975948c016f5adc7dedef825d472f01c8846).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170507968
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170507969
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49117/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170469510
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170691919
  
Yeah! I just read the implementation of `Limit`. As you said, the current 
one is not highly efficient, especially when the number of limits is not small. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170619409
  
Give two tables `tbl_a` and `tbl_b`, `tbl_a` has **billions** of rows but 
`tbl_b` has **thousands** of rows. `tbl_a` has one column `col_frkey_tbl_a` 
whose values should be from `tbl_b`'s column `col_key_tbl_b`. However, one user 
wants to do a quick check to confirm it. The query he can try is 
```
select col_frkey_tbl_a from db.tbl_a limit 1
intersect
select col_key_tbl_b from db.tbl_b 
```

The above query can avoid fetching billions of rows from `tbl_a`. 
Hopefully, it can answer your question. @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170597130
  
@gatorsmile I do see the performance benefits of ```limit``` while 
processing. The reservation I am having is reasoning about non-toplevel 
```limit``` statements. A set-operator example:

select a from db.tbl_a
intersect
select b from db.tbl_b

The result should all distinct rows in ```a``` for which we can find an 
equal tuple in ```b```. Let's add limit to this:

select a from db.tbl_a limit 10
intersect
select b from db.tbl_b limit 10

The result now be the first (distinct?) 10 rows from ```a``` which will be 
filtered by checking if they exist in the first 10 rows of ```b``` (I think). I 
am not sure this is what a user expects, further more:
- You will probably end up with less then 10 rows here.
- The results will be probably non-deterministic (unless you would also 
allow somekind of ordering in a subquery).

Do you have a concrete realworld example where you need this?

I don't really mind if we would put this back in the parser (the engine 
supports it anyway). But I don't think we should just do something like this 
without some consideration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170589671
  
@hvanhovell Let me share my two cents:
  - We have another PR to push down `Limit` through `Union ALL`. However, 
it is impossible to push `Limit` through `Union Distinct`: 
https://github.com/apache/spark/pull/10451
  - If we want to convert a logical plan back a SQL (in 
https://github.com/apache/spark/pull/10541), we need to support it, I think. 
@liancheng Please correct me, if my understanding is wrong. 
  - `Limit` is a super critical when the scale is huge. Our `Dataframe` API 
can add this almost everywhere.  In the long term, we should provide the same 
functions for all the different interfaces, I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170634356
  
**[Test build #49158 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49158/consoleFull)**
 for PR 10689 at commit 
[`94386aa`](https://github.com/apache/spark/commit/94386aa0fb392c51aa0862bc208cff63614b3c62).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170634811
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49158/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170634799
  
**[Test build #49158 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49158/consoleFull)**
 for PR 10689 at commit 
[`94386aa`](https://github.com/apache/spark/commit/94386aa0fb392c51aa0862bc208cff63614b3c62).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170634808
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170638417
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49159/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170638416
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170638728
  
**[Test build #49160 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49160/consoleFull)**
 for PR 10689 at commit 
[`b9ba021`](https://github.com/apache/spark/commit/b9ba021cd276a3c53dbc83f7af3046dab5f09706).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170670978
  
**[Test build #49160 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49160/consoleFull)**
 for PR 10689 at commit 
[`b9ba021`](https://github.com/apache/spark/commit/b9ba021cd276a3c53dbc83f7af3046dab5f09706).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170671485
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49160/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170671482
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-11 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170673019
  
That example seems kind of artificial to me.  Additionally large 
non-terminal limits are not planned very well today so I think users are going 
to be surprised.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10689#discussion_r49290681
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystQlSuite.scala 
---
@@ -49,4 +49,11 @@ class CatalystQlSuite extends PlanTest {
 parser.createPlan("select sum(product + 1) over (partition by (product 
+ (1)) order by 2) " +
   "from windowData")
   }
+
+  test("limit clause: a support in set operation") {
+parser.createPlan("select key from (select * from t1) x limit 1")
+parser.createPlan("select key from (select * from t1 limit 2) x limit 
1")
+parser.createPlan("select key from ((select * from testData limit 1) " 
+
--- End diff --

Sure, will add such a test case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170422086
  
@hvanhovell @rxin Could you take a look? Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10689#discussion_r49288754
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystQlSuite.scala 
---
@@ -49,4 +49,11 @@ class CatalystQlSuite extends PlanTest {
 parser.createPlan("select sum(product + 1) over (partition by (product 
+ (1)) order by 2) " +
   "from windowData")
   }
+
+  test("limit clause: a support in set operation") {
+parser.createPlan("select key from (select * from t1) x limit 1")
+parser.createPlan("select key from (select * from t1 limit 2) x limit 
1")
+parser.createPlan("select key from ((select * from testData limit 1) " 
+
--- End diff --

should we test there is a limit being injected? otherwise the parser 
could've just ignored the clause.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170421465
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49080/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170421463
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170421391
  
**[Test build #49080 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49080/consoleFull)**
 for PR 10689 at commit 
[`310cb32`](https://github.com/apache/spark/commit/310cb323ae969792dee32cbe320448a06c9c1cca).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10689#discussion_r49292062
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystQlSuite.scala 
---
@@ -49,4 +50,16 @@ class CatalystQlSuite extends PlanTest {
 parser.createPlan("select sum(product + 1) over (partition by (product 
+ (1)) order by 2) " +
   "from windowData")
   }
+
+  test("limit clause: a support in set operation") {
+val plan1 = parser.createPlan("select key from (select * from t1) x 
limit 1")
+assert(plan1.collect{ case w: Limit => w }.size === 1)
+
+val plan2 = parser.createPlan("select key from (select * from t1 limit 
2) x limit 1")
+assert(plan2.collect{ case w: Limit => w }.size === 2)
+
+val plan3 = parser.createPlan("select key from ((select * from 
testData limit 1) " +
+  "union all (select * from testData limit 1)) x limit 1")
+assert(plan3.collect{ case w: Limit => w }.size === 3)
--- End diff --

can we do this similar to how we compare plans in various optimizer suites, 
e.g. 
https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeInSuite.scala#L84


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10689#discussion_r49292180
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystQlSuite.scala 
---
@@ -49,4 +50,16 @@ class CatalystQlSuite extends PlanTest {
 parser.createPlan("select sum(product + 1) over (partition by (product 
+ (1)) order by 2) " +
   "from windowData")
   }
+
+  test("limit clause: a support in set operation") {
+val plan1 = parser.createPlan("select key from (select * from t1) x 
limit 1")
+assert(plan1.collect{ case w: Limit => w }.size === 1)
+
+val plan2 = parser.createPlan("select key from (select * from t1 limit 
2) x limit 1")
+assert(plan2.collect{ case w: Limit => w }.size === 2)
+
+val plan3 = parser.createPlan("select key from ((select * from 
testData limit 1) " +
+  "union all (select * from testData limit 1)) x limit 1")
+assert(plan3.collect{ case w: Limit => w }.size === 3)
--- End diff --

Sure, it sounds better. Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170442349
  
**[Test build #49094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49094/consoleFull)**
 for PR 10689 at commit 
[`6244975`](https://github.com/apache/spark/commit/6244975948c016f5adc7dedef825d472f01c8846).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170449790
  
**[Test build #49094 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49094/consoleFull)**
 for PR 10689 at commit 
[`6244975`](https://github.com/apache/spark/commit/6244975948c016f5adc7dedef825d472f01c8846).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170449852
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170449853
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49094/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/10689

[SPARK-12745] [SQL] Hive Parser: Limit is not supported inside Set Operation

The current SQLContext allows the following query, which is copied from a 
test case in SQLQuerySuite:
```
 checkAnswer(sql(
   """
 |select key from ((select * from testData limit 1)
 |  union all (select * from testData limit 1)) x limit 1
   """.stripMargin),
   Row(1)
 )
```
However, it is rejected in the Hive parser. 

This PR is to make Hive parser support the Limit Clause inside Set 
Operator. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark limitInUnion

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10689.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10689


commit 428160fd824309f83127ad68efabd0595d614abd
Author: gatorsmile 
Date:   2016-01-11T01:12:48Z

The Limit Clause can be applied inside the set operation

commit 310cb323ae969792dee32cbe320448a06c9c1cca
Author: gatorsmile 
Date:   2016-01-11T01:14:13Z

Merge remote-tracking branch 'upstream/master' into limitInUnion




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12745] [SQL] Hive Parser: Limit is not ...

2016-01-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10689#issuecomment-170413432
  
**[Test build #49080 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49080/consoleFull)**
 for PR 10689 at commit 
[`310cb32`](https://github.com/apache/spark/commit/310cb323ae969792dee32cbe320448a06c9c1cca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org