[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027766#comment-16027766 ] Dudu Markovitz commented on HIVE-11531: --- Done https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-SelectSyntax https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-LIMITClause > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.patch, HIVE-11531.WIP.1.patch, > HIVE-11531.WIP.2.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027188#comment-16027188 ] Lefty Leverenz commented on HIVE-11531: --- Yes please, [~dmarkovitz] -- thanks for offering. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.patch, HIVE-11531.WIP.1.patch, > HIVE-11531.WIP.2.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026629#comment-16026629 ] Dudu Markovitz commented on HIVE-11531: --- [~leftylev], It seems the offset feature is not documented. Would you like me to do it? > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.patch, HIVE-11531.WIP.1.patch, > HIVE-11531.WIP.2.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065969#comment-15065969 ] Jesus Camacho Rodriguez commented on HIVE-11531: Pushed to branch-2.0. I'll take a look at the changes in union9.q. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Fix For: 2.0.0, 2.1.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066136#comment-15066136 ] Lefty Leverenz commented on HIVE-11531: --- Doc note: This needs to be documented in the wiki (with version information), and a release note should show the syntax. * [LanguageManual Select -- Select Syntax | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-SelectSyntax] > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Labels: TODOC2.0 > Fix For: 2.0.0, 2.1.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063320#comment-15063320 ] Hui Zheng commented on HIVE-11531: -- Hi [~sershe] and [~prasanth_j] The union9 is good in my local machine.Could you give more details about how to reproduce it? {code} mvn test -Dtest=TestCliDriver -Dqfile=union9.q,offset_limit.q .. [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-it-qfile-spark --- [INFO] Compiling 3 source files to /Users/huzheng/git/hive/itests/qtest-spark/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-it-qfile-spark --- [INFO] [INFO] Reactor Summary: [INFO] [INFO] Hive Integration - Parent . SUCCESS [1.191s] [INFO] Hive Integration - Custom Serde ... SUCCESS [1.955s] [INFO] Hive Integration - HCatalog Unit Tests SUCCESS [2.974s] [INFO] Hive Integration - Testing Utilities .. SUCCESS [2.312s] [INFO] Hive Integration - Unit Tests . SUCCESS [4.672s] [INFO] Hive Integration - Test Serde . SUCCESS [0.326s] [INFO] Hive Integration - QFile Tests SUCCESS [1:35.067s] [INFO] Hive Integration - QFile Accumulo Tests ... SUCCESS [2.523s] [INFO] JMH benchmark: Hive ... SUCCESS [0.399s] [INFO] Hive Integration - Unit Tests - Hadoop 2 .. SUCCESS [1.358s] [INFO] Hive Integration - Unit Tests with miniKdc SUCCESS [1.407s] [INFO] Hive Integration - QFile Spark Tests .. SUCCESS [3.650s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 1:58.710s [INFO] Finished at: Fri Dec 18 11:18:28 JST 2015 [INFO] Final Memory: 164M/875M [INFO] {code} > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Fix For: 2.1.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053482#comment-15053482 ] Sergey Shelukhin commented on HIVE-11531: - It appears that union9 is broken since the patch has been committed. The stats have gone negative for some queries. Can you double check? [~prasanth_j] what do negative stats mean? {noformat} < Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL Column stats: COMPLETE < Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL Column stats: COMPLETE < Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL Column stats: COMPLETE < Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL Column stats: COMPLETE < Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL Column stats: COMPLETE < Statistics: Num rows: -1 Data size: 5812 Basic stats: PARTIAL Column stats: COMPLETE --- > Statistics: Num rows: 1500 Data size: 0 Basic stats: > PARTIAL Column stats: COMPLETE > Statistics: Num rows: 1500 Data size: 0 Basic stats: > PARTIAL Column stats: COMPLETE > Statistics: Num rows: 1500 Data size: 0 Basic stats: > PARTIAL Column stats: COMPLETE > Statistics: Num rows: 1500 Data size: 0 Basic stats: PARTIAL > Column stats: COMPLETE > Statistics: Num rows: 1500 Data size: 0 Basic stats: PARTIAL > Column stats: COMPLETE > Statistics: Num rows: 1500 Data size: 0 Basic stats: PARTIAL > Column stats: COMPLETE {noformat} > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Fix For: 2.1.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051527#comment-15051527 ] Sergey Shelukhin commented on HIVE-11531: - Makes sense. branch-2.0 doesn't require approval currently. I will send an update tomorrow morning-ish on the release. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Fix For: 2.1.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051523#comment-15051523 ] Jesus Camacho Rodriguez commented on HIVE-11531: [~sershe], I want to backport this patch to 2.0.0 once HIVE-12644 goes in. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Fix For: 2.1.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050382#comment-15050382 ] Jesus Camacho Rodriguez commented on HIVE-11531: +1 > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047700#comment-15047700 ] Hive QA commented on HIVE-11531: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776243/HIVE-11531.07.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9903 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6283/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6283/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6283/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776243 - PreCommit-HIVE-TRUNK-Build > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045487#comment-15045487 ] Sergey Shelukhin commented on HIVE-11531: - Looks like test failures are related (the ones about limit and offset) > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046273#comment-15046273 ] Hui Zheng commented on HIVE-11531: -- The HIVE-11531.07.patch has passed those related limit and offset in local machine. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043692#comment-15043692 ] Hive QA commented on HIVE-11531: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12775946/HIVE-11531.06.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9901 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit_global_optimizer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit_ppd_optimizer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_offset_limit org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.tez.dag.app.rm.TestLlapTaskSchedulerService.testSimpleLocalAllocation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6261/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6261/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6261/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12775946 - PreCommit-HIVE-TRUNK-Build > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042630#comment-15042630 ] Hive QA commented on HIVE-11531: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12775705/HIVE-11531.04.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6244/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6244/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6244/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6244/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at b2efd1a HIVE-12563: NullPointerException with 3-way Tez merge join (Jason Dere, reviewed by Gunther Hagleitner) + git clean -f -d + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at b2efd1a HIVE-12563: NullPointerException with 3-way Tez merge join (Jason Dere, reviewed by Gunther Hagleitner) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12775705 - PreCommit-HIVE-TRUNK-Build > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034623#comment-15034623 ] Sergey Shelukhin commented on HIVE-11531: - There are result changes in some vectorization tests. I don't think the vectorized operator changes are correct - it will skip the rows from every batch as far as I see. Also what if selected is already in use? > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035243#comment-15035243 ] Hui Zheng commented on HIVE-11531: -- Yes, I have correct it in HIVE-11531.03.patch . But it seems it will take so long time to be tested by Hive QA. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030341#comment-15030341 ] Hive QA commented on HIVE-11531: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12774231/HIVE-11531.02.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 9869 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_int_type_promotion org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_coalesce org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_nvl org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_outer_join5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_string_concat org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_decimal_date org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_offset_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_context org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_date_funcs org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_binary_join_groupby org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_limit org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6148/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6148/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6148/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 23 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12774231 - PreCommit-HIVE-TRUNK-Build > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.02.patch, HIVE-11531.WIP.1.patch, > HIVE-11531.WIP.2.patch, HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026010#comment-15026010 ] Hui Zheng commented on HIVE-11531: -- Thanks [~jcamachorodriguez] I have implemented {code} LIMIT n OFFSET skip {code} > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.02.patch, HIVE-11531.WIP.1.patch, > HIVE-11531.WIP.2.patch, HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005265#comment-15005265 ] Jesus Camacho Rodriguez commented on HIVE-11531: [~huizane], thanks for working on this! Agreed with the comments left by [~sershe]. In addition: - This block in CalcitePlanner: {code} Integer offset = qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next())==null? 0:qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next()).getKey(); Integer limit = qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next())==null? null:qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next()).getValue(); {code} No need to create the iterator twice for offset (and for limit) and call next. Could you cache the value to avoid unnecessary overhead and improve readability? - A couple of additional style notes. There is typo in {{getOffetExpr}} method in HiveSortLimit. We could rename old {{limit}} variable to {{fetch}} (for instance, in line 2275 of CalcitePlanner) to avoid confusion. Please, pay attention with code spacing (e.g. {{}else{}}). - On a side note, maybe we should support the more verbose syntax too (as part of this JIRA or a follow-up). In addition to the abbreviated syntax: {noformat} LIMIT (skip,)? n {noformat} that is mostly used by MySQL, we could support: {noformat} LIMIT n OFFSET skip {noformat} which is supported e.g. by MySQL and PostgreSQL. [~sershe], Calcite should handle offset properly: support for offset was already there. The only code that needs to be extended is in HIVE-11684; I will update the patch accordingly so we do not ignore offset. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004738#comment-15004738 ] Sergey Shelukhin commented on HIVE-11531: - It looks like many q.out files changed due to "Offset" being added. In any case, I think there's no need to output the offset 0, because that is the implied default for LIMIT. If you return null from the method, the explain setting will not be output. The rest looks good to me, except for some spacing inconsistent with the surrounding file (e.g. ==2 w/o spaces when processing TOK_LIMIT, etc). [~jcamachorodriguez] can you take a look at Calcite related changes, would this work with Calcite? > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998457#comment-14998457 ] Hive QA commented on HIVE-11531: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12771506/HIVE-11531.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 220 failed/errored test(s), 9783 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguitycheck org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cp_sel org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_gby_star org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input11_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input14_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert1_overwrite_partitions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert2_overwrite_partitions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_interval_arithmetic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_vc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_explode2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_noalias org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_onview org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_outer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_ppd org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_decimal org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_double org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_ints org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_string org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_macro org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch_threshold org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonreserved_keywords_insert_into1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_num_op_type_conv org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_offset_limit_global_optimizer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_int_type_promotion org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971580#comment-14971580 ] Sergey Shelukhin commented on HIVE-11531: - Do you have more specific questions? My guess is that for these optimizers, the first thing to do is to push the total (offset+limit) in place of old limit. I.e. if you select limit 10, 20, it would push down limit 30, and then Hive logic will discard 10 rows as usual. There is probably other optimization possible as step 2, i.e. not evaluating stuff for first 10 rows in this case, but it may be more difficult. For now, the simple step should suffice. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958607#comment-14958607 ] Jesus Camacho Rodriguez commented on HIVE-11531: Awesome, thanks [~huizane]! Sort operator for CBO (HiveSortLimit) has already support for offset and limit; most rules involving Limit (maybe all) should have support for fetch too. Integration with CBO would imply 1) setting the offset for the Calcite operator in SemanticAnalyzer, 2) translating the offset contained in the Calcite operator back in ASTConverter, and 3) modifying any rule that might need to be updated to work properly with offset (if needed). I have seen in the patch that 1) is already done. [~huizane], could you complete 2) and add tests to offset_limit.q with CBO on to verify that it is working properly? The problem with implementing only 1) is that we would be reading the offset from the query and setting it in the HiveSortLimit operator, but unless 2) is completed, we would be losing it when we translate back the Calcite operator. FYI [~jpullokkaran] > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.WIP.1.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958715#comment-14958715 ] Hui Zheng commented on HIVE-11531: -- Thanks [~sershe] and [~jcamachorodriguez] for your instructions I will continue on it. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.WIP.1.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957534#comment-14957534 ] Sergey Shelukhin commented on HIVE-11531: - Skimmed the patch. {noformat} + private final HashMapdestToLimitOffset; {noformat} This could reuse the existing hashmap and store a pair, or put both ints into a long. Otherwise the approach makes sense. [~jcamachorodriguez] do you want to take a look and maybe give some pointers for CBO? > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.WIP.1.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715263#comment-14715263 ] Sergey Shelukhin commented on HIVE-11531: - I think it's already assigned to you :) Thanks for the interest and let me know if you have questions. Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise - Key: HIVE-11531 URL: https://issues.apache.org/jira/browse/HIVE-11531 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Hui Zheng For any UIs that involve pagination, it is useful to issue queries in the form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be paginated (which can be extremely large by itself). At present, ROW_NUMBER can be used to achieve this effect, but optimizations for LIMIT such as TopN in ReduceSink do not apply to ROW_NUMBER. We can add first class support for skip to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713004#comment-14713004 ] Hui Zheng commented on HIVE-11531: -- Hi [~sershe] I'd like to work on this jira. Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise - Key: HIVE-11531 URL: https://issues.apache.org/jira/browse/HIVE-11531 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Hui Zheng For any UIs that involve pagination, it is useful to issue queries in the form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be paginated (which can be extremely large by itself). At present, ROW_NUMBER can be used to achieve this effect, but optimizations for LIMIT such as TopN in ReduceSink do not apply to ROW_NUMBER. We can add first class support for skip to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)