[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2017-05-28 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027766#comment-16027766
 ] 

Dudu Markovitz commented on HIVE-11531:
---

Done 

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-SelectSyntax
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-LIMITClause



> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.patch, HIVE-11531.WIP.1.patch, 
> HIVE-11531.WIP.2.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2017-05-26 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027188#comment-16027188
 ] 

Lefty Leverenz commented on HIVE-11531:
---

Yes please, [~dmarkovitz] -- thanks for offering.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.patch, HIVE-11531.WIP.1.patch, 
> HIVE-11531.WIP.2.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2017-05-26 Thread Dudu Markovitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026629#comment-16026629
 ] 

Dudu Markovitz commented on HIVE-11531:
---

[~leftylev], It seems the offset feature is not documented. Would you like me 
to do it?

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.patch, HIVE-11531.WIP.1.patch, 
> HIVE-11531.WIP.2.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-20 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065969#comment-15065969
 ] 

Jesus Camacho Rodriguez commented on HIVE-11531:


Pushed to branch-2.0. I'll take a look at the changes in union9.q.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066136#comment-15066136
 ] 

Lefty Leverenz commented on HIVE-11531:
---

Doc note:  This needs to be documented in the wiki (with version information), 
and a release note should show the syntax.

* [LanguageManual Select -- Select Syntax | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-SelectSyntax]

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
>  Labels: TODOC2.0
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-17 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063320#comment-15063320
 ] 

Hui Zheng commented on HIVE-11531:
--

Hi [~sershe] and [~prasanth_j]
The union9 is good in my local machine.Could you give more details about how to 
reproduce it?
{code}
mvn test -Dtest=TestCliDriver -Dqfile=union9.q,offset_limit.q
..
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-it-qfile-spark ---
[INFO] Compiling 3 source files to 
/Users/huzheng/git/hive/itests/qtest-spark/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-it-qfile-spark 
---
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Hive Integration - Parent . SUCCESS [1.191s]
[INFO] Hive Integration - Custom Serde ... SUCCESS [1.955s]
[INFO] Hive Integration - HCatalog Unit Tests  SUCCESS [2.974s]
[INFO] Hive Integration - Testing Utilities .. SUCCESS [2.312s]
[INFO] Hive Integration - Unit Tests . SUCCESS [4.672s]
[INFO] Hive Integration - Test Serde . SUCCESS [0.326s]
[INFO] Hive Integration - QFile Tests  SUCCESS [1:35.067s]
[INFO] Hive Integration - QFile Accumulo Tests ... SUCCESS [2.523s]
[INFO] JMH benchmark: Hive ... SUCCESS [0.399s]
[INFO] Hive Integration - Unit Tests - Hadoop 2 .. SUCCESS [1.358s]
[INFO] Hive Integration - Unit Tests with miniKdc  SUCCESS [1.407s]
[INFO] Hive Integration - QFile Spark Tests .. SUCCESS [3.650s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 1:58.710s
[INFO] Finished at: Fri Dec 18 11:18:28 JST 2015
[INFO] Final Memory: 164M/875M
[INFO] 
{code}

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Fix For: 2.1.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053482#comment-15053482
 ] 

Sergey Shelukhin commented on HIVE-11531:
-

It appears that union9 is broken since the patch has been committed. The stats 
have gone negative for some queries. Can you double check?

[~prasanth_j] what do negative stats mean?
{noformat}
<   Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL 
Column stats: COMPLETE
<   Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL 
Column stats: COMPLETE
<   Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL 
Column stats: COMPLETE
< Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL 
Column stats: COMPLETE
< Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL 
Column stats: COMPLETE
< Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL 
Column stats: COMPLETE
---
>   Statistics: Num rows: 1500 Data size: 0 Basic stats: 
> PARTIAL Column stats: COMPLETE
>   Statistics: Num rows: 1500 Data size: 0 Basic stats: 
> PARTIAL Column stats: COMPLETE
>   Statistics: Num rows: 1500 Data size: 0 Basic stats: 
> PARTIAL Column stats: COMPLETE
> Statistics: Num rows: 1500 Data size: 0 Basic stats: PARTIAL 
> Column stats: COMPLETE
> Statistics: Num rows: 1500 Data size: 0 Basic stats: PARTIAL 
> Column stats: COMPLETE
> Statistics: Num rows: 1500 Data size: 0 Basic stats: PARTIAL 
> Column stats: COMPLETE
{noformat}

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Fix For: 2.1.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051527#comment-15051527
 ] 

Sergey Shelukhin commented on HIVE-11531:
-

Makes sense. branch-2.0 doesn't require approval currently. I will send an 
update tomorrow morning-ish on the release.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Fix For: 2.1.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-10 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051523#comment-15051523
 ] 

Jesus Camacho Rodriguez commented on HIVE-11531:


[~sershe], I want to backport this patch to 2.0.0 once HIVE-12644 goes in.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Fix For: 2.1.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-10 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050382#comment-15050382
 ] 

Jesus Camacho Rodriguez commented on HIVE-11531:


+1

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047700#comment-15047700
 ] 

Hive QA commented on HIVE-11531:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776243/HIVE-11531.07.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9903 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6283/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6283/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6283/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776243 - PreCommit-HIVE-TRUNK-Build

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045487#comment-15045487
 ] 

Sergey Shelukhin commented on HIVE-11531:
-

Looks like test failures are related (the ones about limit and offset)

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-07 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046273#comment-15046273
 ] 

Hui Zheng commented on HIVE-11531:
--

The HIVE-11531.07.patch has passed those related limit and offset in local 
machine.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043692#comment-15043692
 ] 

Hive QA commented on HIVE-11531:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12775946/HIVE-11531.06.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9901 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit_global_optimizer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit_ppd_optimizer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_offset_limit
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.tez.dag.app.rm.TestLlapTaskSchedulerService.testSimpleLocalAllocation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6261/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6261/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6261/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12775946 - PreCommit-HIVE-TRUNK-Build

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042630#comment-15042630
 ] 

Hive QA commented on HIVE-11531:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12775705/HIVE-11531.04.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6244/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6244/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6244/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6244/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at b2efd1a HIVE-12563: NullPointerException with 3-way Tez merge 
join (Jason Dere, reviewed by Gunther Hagleitner)
+ git clean -f -d
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at b2efd1a HIVE-12563: NullPointerException with 3-way Tez merge 
join (Jason Dere, reviewed by Gunther Hagleitner)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12775705 - PreCommit-HIVE-TRUNK-Build

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034623#comment-15034623
 ] 

Sergey Shelukhin commented on HIVE-11531:
-

There are result changes in some vectorization tests. I don't think the 
vectorized operator changes are correct - it will skip the rows from every 
batch as far as I see. Also what if selected is already in use?

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-01 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035243#comment-15035243
 ] 

Hui Zheng commented on HIVE-11531:
--

Yes, I have correct it in HIVE-11531.03.patch .
But it seems it will take so long time to be tested by Hive QA.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-11-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030341#comment-15030341
 ] 

Hive QA commented on HIVE-11531:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12774231/HIVE-11531.02.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 9869 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_int_type_promotion
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_coalesce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_nvl
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_string_concat
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_decimal_date
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_offset_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_context
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_date_funcs
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_binary_join_groupby
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_limit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6148/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6148/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6148/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 23 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12774231 - PreCommit-HIVE-TRUNK-Build

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.WIP.1.patch, 
> HIVE-11531.WIP.2.patch, HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-11-24 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026010#comment-15026010
 ] 

Hui Zheng commented on HIVE-11531:
--

Thanks [~jcamachorodriguez]
I have implemented 
{code}
LIMIT n OFFSET skip
{code}


> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.WIP.1.patch, 
> HIVE-11531.WIP.2.patch, HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-11-14 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005265#comment-15005265
 ] 

Jesus Camacho Rodriguez commented on HIVE-11531:


[~huizane], thanks for working on this!

Agreed with the comments left by [~sershe]. In addition:

- This block in CalcitePlanner:
{code}
  Integer offset = 
qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next())==null?
  
0:qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next()).getKey();
  Integer limit = 
qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next())==null?
  
null:qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next()).getValue();
{code}
No need to create the iterator twice for offset (and for limit) and call next. 
Could you cache the value to avoid unnecessary overhead and improve readability?
- A couple of additional style notes. There is typo in {{getOffetExpr}} method 
in HiveSortLimit. We could rename old {{limit}} variable to {{fetch}} (for 
instance, in line 2275 of CalcitePlanner) to avoid confusion. Please, pay 
attention with code spacing (e.g. {{}else{}}).
- On a side note, maybe we should support the more verbose syntax too (as part 
of this JIRA or a follow-up). In addition to the abbreviated syntax:
{noformat}
LIMIT (skip,)? n
{noformat}
that is mostly used by MySQL, we could support:
{noformat}
LIMIT n OFFSET skip
{noformat}
which is supported e.g. by MySQL and PostgreSQL.

[~sershe], Calcite should handle offset properly: support for offset was 
already there. The only code that needs to be extended is in HIVE-11684; I will 
update the patch accordingly so we do not ignore offset.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-11-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004738#comment-15004738
 ] 

Sergey Shelukhin commented on HIVE-11531:
-

It looks like many q.out files changed due to "Offset" being added. In any 
case, I think there's no need to output the offset 0, because that is the 
implied default for LIMIT. If you return null from the method, the explain 
setting will not be output.

The rest looks good to me, except for some spacing inconsistent with the 
surrounding file (e.g. ==2 w/o spaces when processing TOK_LIMIT, etc).

[~jcamachorodriguez] can you take a look at Calcite related changes, would this 
work with Calcite?



> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-11-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998457#comment-14998457
 ] 

Hive QA commented on HIVE-11531:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12771506/HIVE-11531.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 220 failed/errored test(s), 9783 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguitycheck
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cp_sel
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_gby_star
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input11_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input14_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert1_overwrite_partitions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert2_overwrite_partitions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_interval_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_vc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_explode2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_noalias
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_onview
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_outer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_ppd
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_double
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_ints
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_string
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_macro
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch_threshold
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonreserved_keywords_insert_into1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_num_op_type_conv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit_global_optimizer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_int_type_promotion
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown

[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-10-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971580#comment-14971580
 ] 

Sergey Shelukhin commented on HIVE-11531:
-

Do you have more specific questions? My guess is that for these optimizers, the 
first thing to do is to push the total (offset+limit) in place of old limit. 
I.e. if you select  limit 10, 20, it would push down limit 30, and then 
Hive logic will discard 10 rows as usual. There is probably other optimization 
possible as step 2, i.e. not evaluating stuff for first 10 rows in this case, 
but it may be more difficult. For now, the simple step should suffice.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-10-15 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958607#comment-14958607
 ] 

Jesus Camacho Rodriguez commented on HIVE-11531:


Awesome, thanks [~huizane]!

Sort operator for CBO (HiveSortLimit) has already support for offset and limit; 
most rules involving Limit (maybe all) should have support for fetch too.

Integration with CBO would imply 1) setting the offset for the Calcite operator 
in SemanticAnalyzer, 2) translating the offset contained in the Calcite 
operator back in ASTConverter, and 3) modifying any rule that might need to be 
updated to work properly with offset (if needed).

I have seen in the patch that 1) is already done. [~huizane], could you 
complete 2) and add tests to offset_limit.q with CBO on to verify that it is 
working properly?
The problem with implementing only 1) is that we would be reading the offset 
from the query and setting it in the HiveSortLimit operator, but unless 2) is 
completed, we would be losing it when we translate back the Calcite operator.

FYI [~jpullokkaran]

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-10-15 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958715#comment-14958715
 ] 

Hui Zheng commented on HIVE-11531:
--

Thanks [~sershe] and [~jcamachorodriguez] for your instructions
I will continue on it.
 

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-10-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957534#comment-14957534
 ] 

Sergey Shelukhin commented on HIVE-11531:
-

Skimmed the patch.
{noformat}
+  private final HashMap destToLimitOffset;
{noformat}
This could reuse the existing hashmap and store a pair, or put both ints into a 
long.

Otherwise the approach makes sense.
[~jcamachorodriguez] do you want to take a look and maybe give some pointers 
for CBO?


> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.WIP.1.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-08-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715263#comment-14715263
 ] 

Sergey Shelukhin commented on HIVE-11531:
-

I think it's already assigned to you :) Thanks for the interest and let me know 
if you have questions.

 Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
 -

 Key: HIVE-11531
 URL: https://issues.apache.org/jira/browse/HIVE-11531
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Hui Zheng

 For any UIs that involve pagination, it is useful to issue queries in the 
 form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
 paginated (which can be extremely large by itself). At present, ROW_NUMBER 
 can be used to achieve this effect, but optimizations for LIMIT such as TopN 
 in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
 skip to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-08-26 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713004#comment-14713004
 ] 

Hui Zheng commented on HIVE-11531:
--

Hi [~sershe]
I'd like to work on this jira.

 Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
 -

 Key: HIVE-11531
 URL: https://issues.apache.org/jira/browse/HIVE-11531
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Hui Zheng

 For any UIs that involve pagination, it is useful to issue queries in the 
 form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
 paginated (which can be extremely large by itself). At present, ROW_NUMBER 
 can be used to achieve this effect, but optimizations for LIMIT such as TopN 
 in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
 skip to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)