[jira] [Commented] (IMPALA-6821) Push down limits into Kudu
[ https://issues.apache.org/jira/browse/IMPALA-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459780#comment-16459780 ] ASF subversion and git services commented on IMPALA-6821: - Commit 955ad0833fdbe61ebb29d32e9b04757b467917be in impala's branch refs/heads/2.x from [~twmarshall] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=955ad08 ] IMPALA-6821: Push down limits into Kudu This patch takes advantage of a recent change in Kudu (KUDU-16) that exposes the ability to set limits on KuduScanners. Since each KuduScanner corresponds to a scan token, and there will be multiple scan tokens per query, this is just a performance optimization in cases where the limit is smaller than the number of rows per token, and Impala still needs to apply the limit on our side for cases where the limit is greater than the number of rows per token. Testing: - Added e2e tests for various situations where limits are applied at a Kudu scan node. - For the query 'select * from tpch_kudu.lineitem limit 1', a best case perf scenario for this change where the limit is highly effective, the time spent in the Kudu scan node was reduced from 6.107ms to 3.498ms (avg over 3 runs). - For the query 'select count(*) from (select * from tpch_kudu.lineitem limit 100) v', a worst case perf scenario for this change where the limit is ineffective, the time spent in the Kudu scan node was essentially unchanged, 32.815ms previously vs. 29.532ms (avg over 3 runs). Change-Id: Ibe35e70065d8706b575e24fe20902cd405b49941 Reviewed-on: http://gerrit.cloudera.org:8080/10119 Reviewed-by: Thomas Tauber-MarshallTested-by: Impala Public Jenkins > Push down limits into Kudu > -- > > Key: IMPALA-6821 > URL: https://issues.apache.org/jira/browse/IMPALA-6821 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Andrew Wong >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: kudu, perf > > The progress made in KUDU-16 introduced a way to impose limits on Kudu > scanners, potentially reducing the number of RPCs sent per scan, CPU used for > scanner evaluation on the tablet server, etc. It would be nice if Impala > could make use of this new behavior. > I put up an admittedly clunky [patch|https://gerrit.cloudera.org/c/9923/] > that implements limits on a per-token basis. I'm no Impala expert, so there > may be an API missing in Kudu that would make Impala's life easier in > implementing this. Maybe it's as easy as adjusting the limit after > re-hydrating the token into a scanner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6821) Push down limits into Kudu
[ https://issues.apache.org/jira/browse/IMPALA-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457165#comment-16457165 ] ASF subversion and git services commented on IMPALA-6821: - Commit 87be63e321f688486b98d4ea69200967a8a2effa in impala's branch refs/heads/master from [~twmarshall] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=87be63e ] IMPALA-6821: Push down limits into Kudu This patch takes advantage of a recent change in Kudu (KUDU-16) that exposes the ability to set limits on KuduScanners. Since each KuduScanner corresponds to a scan token, and there will be multiple scan tokens per query, this is just a performance optimization in cases where the limit is smaller than the number of rows per token, and Impala still needs to apply the limit on our side for cases where the limit is greater than the number of rows per token. Testing: - Added e2e tests for various situations where limits are applied at a Kudu scan node. - For the query 'select * from tpch_kudu.lineitem limit 1', a best case perf scenario for this change where the limit is highly effective, the time spent in the Kudu scan node was reduced from 6.107ms to 3.498ms (avg over 3 runs). - For the query 'select count(*) from (select * from tpch_kudu.lineitem limit 100) v', a worst case perf scenario for this change where the limit is ineffective, the time spent in the Kudu scan node was essentially unchanged, 32.815ms previously vs. 29.532ms (avg over 3 runs). Change-Id: Ibe35e70065d8706b575e24fe20902cd405b49941 Reviewed-on: http://gerrit.cloudera.org:8080/10119 Reviewed-by: Thomas Tauber-MarshallTested-by: Impala Public Jenkins > Push down limits into Kudu > -- > > Key: IMPALA-6821 > URL: https://issues.apache.org/jira/browse/IMPALA-6821 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Andrew Wong >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: kudu, perf > > The progress made in KUDU-16 introduced a way to impose limits on Kudu > scanners, potentially reducing the number of RPCs sent per scan, CPU used for > scanner evaluation on the tablet server, etc. It would be nice if Impala > could make use of this new behavior. > I put up an admittedly clunky [patch|https://gerrit.cloudera.org/c/9923/] > that implements limits on a per-token basis. I'm no Impala expert, so there > may be an API missing in Kudu that would make Impala's life easier in > implementing this. Maybe it's as easy as adjusting the limit after > re-hydrating the token into a scanner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org