[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393897#comment-16393897 ] Hudson commented on PHOENIX-1267: - SUCCESS: Integrated in Jenkins build PreCommit-PHOENIX-Build #1797 (See [https://builds.apache.org/job/PreCommit-PHOENIX-Build/1797/]) PHOENIX-1267 Set scan.setSmall(true) when appropriate (Abhishek Singh (jtaylor: rev abfc1ff4d37d67eb4666c4c7db24cb22f041768c) * (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServices.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/execute/BaseQueryPlan.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java * (edit) phoenix-core/src/test/java/org/apache/phoenix/compile/QueryCompilerTest.java > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Abhishek Singh Chouhan >Priority: Major > Labels: SFDC, newbie > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-1267.master.patch, PHOENIX-1267.master.patch, > smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393885#comment-16393885 ] Hudson commented on PHOENIX-1267: - FAILURE: Integrated in Jenkins build Phoenix-4.x-HBase-1.3 #55 (See [https://builds.apache.org/job/Phoenix-4.x-HBase-1.3/55/]) PHOENIX-1267 Set scan.setSmall(true) when appropriate (Abhishek Singh (jtaylor: rev 1cf0744024056828f42336a0fb92f0f3bff56961) * (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServices.java * (edit) phoenix-core/src/test/java/org/apache/phoenix/compile/QueryCompilerTest.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/execute/BaseQueryPlan.java > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Abhishek Singh Chouhan >Priority: Major > Labels: SFDC, newbie > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-1267.master.patch, PHOENIX-1267.master.patch, > smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390851#comment-16390851 ] Abhishek Singh Chouhan commented on PHOENIX-1267: - Have attached the rebased patch. Thanks for review [~jamestaylor]! > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Abhishek Singh Chouhan >Priority: Major > Labels: SFDC, newbie > Fix For: 4.14.0 > > Attachments: PHOENIX-1267.master.patch, PHOENIX-1267.master.patch, > smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389861#comment-16389861 ] James Taylor commented on PHOENIX-1267: --- +1 on the patch. Nice work, [~abhishek.chouhan]. Would you mind attaching a rebased version of the patch as it's no longer applying cleanly? Then I'll get it committed on your behalf. > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Abhishek Singh Chouhan >Priority: Major > Labels: SFDC, newbie > Fix For: 4.14.0 > > Attachments: PHOENIX-1267.master.patch, smallscan.patch, > smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369273#comment-16369273 ] James Taylor commented on PHOENIX-1267: --- Yes, this sounds good, [~abhishek.chouhan]. I think the tricky part is figuring out what the threshold should be. > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Abhishek Singh Chouhan >Priority: Major > Labels: SFDC, newbie > Fix For: 4.14.0 > > Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369141#comment-16369141 ] Abhishek Singh Chouhan commented on PHOENIX-1267: - Apologies for the delay, got a bit busy. * Was thinking of having an attributeĀ as suggested above, something like "phoenix.query.smallScanThreshold" in QueryServices and its default of 100 in QueryServicesOption * In BaseQueryPlan.iterator(...) where we check for SMALL hint, also check for pointlookup and its count (since we would already have the calculated scanranges by this time in context), something like {code:java} if (statement.getHint().hasHint(Hint.SMALL) || (scanRanges.isPointLookup() && scanRanges.getPointLookupCount() < smallScanThreshold)) { scan.setSmall(true); }{code} [~jamestaylor] [~vincentpoon] Please let me know if this sounds okay and i'll put up a patch. Thanks!! > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Abhishek Singh Chouhan >Priority: Major > Labels: SFDC, newbie > Fix For: 4.14.0 > > Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359053#comment-16359053 ] James Taylor commented on PHOENIX-1267: --- Yes, that'd be great, [~abhishek.chouhan]. I've assigned the issue to you. > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Abhishek Singh Chouhan >Priority: Major > Labels: SFDC, newbie > Fix For: 4.14.0 > > Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357038#comment-16357038 ] Abhishek Singh Chouhan commented on PHOENIX-1267: - Can have a go at this if someone else is not already working on it :) [~jamestaylor] > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Labels: SFDC, newbie > Fix For: 4.14.0 > > Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049813#comment-16049813 ] James Taylor commented on PHOENIX-1267: --- FYI, both PHOENIX-3073 and PHOENIX-3906 have useful information on this one. I think a good first step would be to automatically make point lookup queries small scans when the number of rows returned is less than a configurable threshold (see attached WIP patch). Certainly if we're returning a single row, a small scan will be much better (according to above JIRAs, 50% better). We could start with a configuration of 100 perhaps? Also, the configuration parameter could be set to zero which would essentially disable this feature which is a good thing. [~Misraji] - this would be a good one for you to start with IMHO. > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Labels: SFDC > Fix For: 4.12.0 > > Attachments: smallscan2.patch, smallscan3.patch, smallscan.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166226#comment-14166226 ] Lars Hofhansl commented on PHOENIX-1267: Thinking about skip-scans now... Might be a good candidate too - assuming that we're likely skip out of the 2mb we prefetched with just a few skips. Hard to say when that will be the case. I'd say do the small scan for skip-scans too. > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: jay wong > Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166128#comment-14166128 ] James Taylor commented on PHOENIX-1267: --- Our guideposts aren't that granular - they're more in the range of 1/10 of the region size, so perhaps 500MB - 1GB by default (am I doing my math right?). So I guess we shouldn't decide based on that. Use our "point lookup" is a skip scan. We essentially hop from row to row using SEEK_NEXT_HINT to get there. Would this be a good candidate for using small scan? Would it depend on how many seeks we're doing? We could figure that out in advance. So that leaves us with the scan case. We can only know we'll scan < N rows if we have a LIMIT or if we'll go through our ChunkedResultIterator. So those two cases seem good to use a small scan. We can turn small scan off for the ChunkedResulterator case after we hit the end of the first batch (3000 rows by default). > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: jay wong > Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166116#comment-14166116 ] Lars Hofhansl commented on PHOENIX-1267: A small scan does two things: # avoids prefetch - if we're only reading a few K of data, prefetching many megabytes (I think 2mb by default) at the data nodes is a waste (it does that by doing an HDFS positional read or "pread") # avoids one RPC - you do not need to close the scanner via an extra RPC. If the scan takes a short amount of time the extra RPC can be significant. I didn't get the point lookup case. Are those different from HBase Gets? Or you're talking about skip_scan traversing a bunch of points that are known ahead of time? I'm assuming you mean a skip_scan here. The only reliable case seems to be "a scan where we know we're only scanning < N rows", where we can reliably predict that data scanned is in the ballpark of 64k maybe to 1mb or so. For the "only traverse N guideposts segments" only if the guideposts distances are known to be small (again in the 64k-1mb range). > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: jay wong > Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166079#comment-14166079 ] James Taylor commented on PHOENIX-1267: --- You lost me at "pread" :-) I agree that we should at a minimum provide the hints, as that's easy. So using small scan is based on how much data you're reading, right? Not how much data you're returning from the server? Or is it both? In all cases in Phoenix, each scan will be within a single region and will scan over at most a configurable number of bytes (i.e. determined by the guidepost depth stats config). I'm going to call that a *segment*. Does that help us, in that we know in advance how many segments we're scanning over? Let me throw some Phoenix situations at you and if you have a chance, tell me if you think they'd benefit from using small scan: - a point lookup. We know how many keys we're looking for in advance and they are complete row keys. Would using/not using small scan depend on how many point keys we're looking for? Or on how many segments we're looking for them in? - an ungrouped aggregation (i.e. it'll return a single row, but potentially scan lots of rows). - a grouped aggregation. - a scan where we know we're only scanning < N rows. This is the ChunkedResultIterator case, where we run the scan until a limit, and then run it again, starting from where we left off. It's also the case where we have a LIMIT on a non aggregate scan without an ORDER BY. - an ordered scan. We sort on the RS side and then merge sort on the client. The number of rows returned depends on the WHERE clause. - any query knowing that it'll only traverse N guideposts segments (we know this in advance), where N is the guidepost depth (maybe 1/10 of the region). Thanks, [~lhofhansl]. > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: jay wong > Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166034#comment-14166034 ] Lars Hofhansl commented on PHOENIX-1267: What do you think, James? > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: jay wong > Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164803#comment-14164803 ] Lars Hofhansl commented on PHOENIX-1267: Small scan also force pread as opposed to seek+read. seek+read is cheaper and also does prefetching at the datanodes, but only one scanner can use it per reader (i.e. HFile). See HBASE-7336. This is a touch to predict when to use it. If the individual scans are so small that prefetching has no benefit or that two RPCs would be significant as opposed to one RPC then a small scan makes sense. In the doc is says "Generally, if the scan range is within one data block(64KB), it could be considered as a small scan." Starting with a small scan and then switching cool. If we scan multiple HFile blocks and these blocks are not in the HBase blockcache seek+read has the potential of being much faster. But we'd only eat that cost for one scan in the beginning. Might be best to start with the hint only (and default to false) and perf test a variety of scenarios. > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: jay wong > Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164779#comment-14164779 ] James Taylor commented on PHOENIX-1267: --- Thinking about this more, I think we can have small scan set to true most of the time. Our scans are parallelized, so they always target only part of a region. So immediately in QueryCompiler.compileSingleQuery(), call setSmallScan(true) unless there's a NO_SMALL_SCAN hint. Then, in the following cases, we'd turn it off: - if there's no where clause and no limit. Probably easiest to determine this in QueryCompiler.compileSingleQuery(). - if a second "chunk" of data is returned from a parallel scan. This can be set in ChunkedResultIterator.getResultIterator(). - If there's an order by and we're traversing over a large number of segments (based on a new config parameter). The order by doesn't go through ChunkedResultIterator, so we don't have a good way of turning the option back off. You can determine how much data the scan will traverse by looking at splits.size(). This is an estimation of how many 30MB chunks (phoenix.stats.guidepost.width) of data that will be traversed by the scan. Then in BaseQueryIterator.iterators(), we call scan.setSmallScan(true) if the SMALL_SCAN is used which would override the above logic. Thoughts? [~lhofhansl] > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: jay wong > Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163246#comment-14163246 ] jay wong commented on PHOENIX-1267: --- [~jamestaylor] About joins. I think that the joins query is always a big query almost the whole table. about all big query. the small scan has a bad performance. So I set smallScan false before > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: jay wong > Attachments: smallscan.patch, smallscan2.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163236#comment-14163236 ] jay wong commented on PHOENIX-1267: --- [~jamestaylor] About isScanForbidden logic is not about topN, it's about order by. But i found that the order by query sign for TOPN. eg: we have a result like. assume that small scan query 100 results one query. |key|col1| |1|2| |2|1| |...|...| |100|55| |2|101| when next query. it found the row which is rowkey is greater than the last rowkey in the last page. so the rowkey will be fall into a Infinite loop. > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: jay wong > Attachments: smallscan.patch, smallscan2.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163189#comment-14163189 ] James Taylor commented on PHOENIX-1267: --- Thanks for the patch, [~jaywong]. Please encapsulate this complete behind the implementors of QueryPlan. More specifically: - Add a protected method called isSmallScan in BaseQueryPlan. - Override this method in ScanPlan (which is the plan for non aggregate queries) with the part of your logic from isScanForbidden. You'd want to check if order by is not empty here - I don't think topN is the correct criteria. - Also override the method in AggregatePlan. I think you always want to return false here. - I don't think you need to do anything special for joins. - In the BaseQueryPlan.iterators() method you can call scan.setScanSmall(). First call isSmallScan(). Then call statement.getHint().hasHint(Hint.SMALL_SCAN) and statement.getHint().hasHint(Hint.NO_SMALL_SCAN) to override the value you got from isSmallScan(). Based on this logic, you'll have the correct value to use for scan.setScanSmall(). > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: jay wong > Attachments: smallscan.patch, smallscan2.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163003#comment-14163003 ] jay wong commented on PHOENIX-1267: --- I have a holiday in the past several days. so sorry for reply later. I know your mean. normally the hint is more structured and a better way. I think use hint control the small is a good point. the small scan will be set true default when both the startkey and stopkey is setted. if we have a order by query. and the small is true. the result will be Infinite loop. So I think the small scan is not only a query optimize for user. I will cause a bug. So I think the smallScanForbidden is needed also. > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: jay wong > Attachments: smallscan.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155497#comment-14155497 ] James Taylor commented on PHOENIX-1267: --- This patch is a good start. I'd recommend the following: - add two new hints: SMALL_SCAN and NO_SMALL_SCAN to the HintNode.Hint enum. - add a isSmallScan method in QueryPlan and then implement it for the implementors of this interface. This will be similar to what you're doing in your ScanUtil.smallScanForbidden, but in a more structured way. You can get the information directly from the implementors, rather than rely on state from the Scan. - If the hints contain Hint.SMALL_SCAN, then set small scan to true, if the hints contain Hint.NO_SMALL_SCAN, then set small scan to false. These hints would override the setting of small scan from the above default logic If you're interested in pursuing this, please let me know [~jaywong]. > Set scan.setSmall(true) when appropriate > > > Key: PHOENIX-1267 > URL: https://issues.apache.org/jira/browse/PHOENIX-1267 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Attachments: smallscan.patch > > > There's a nice optimization that has been in HBase for a while now to set a > scan as "small". This prevents extra RPC calls, I believe. We should add a > hint for queries that forces it to be set/not set, and make our best guess on > when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)