[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2018-03-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393897#comment-16393897
 ] 

Hudson commented on PHOENIX-1267:
-

SUCCESS: Integrated in Jenkins build PreCommit-PHOENIX-Build #1797 (See 
[https://builds.apache.org/job/PreCommit-PHOENIX-Build/1797/])
PHOENIX-1267 Set scan.setSmall(true) when appropriate (Abhishek Singh (jtaylor: 
rev abfc1ff4d37d67eb4666c4c7db24cb22f041768c)
* (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServices.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/execute/BaseQueryPlan.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java
* (edit) 
phoenix-core/src/test/java/org/apache/phoenix/compile/QueryCompilerTest.java


> Set scan.setSmall(true) when appropriate
> 
>
> Key: PHOENIX-1267
> URL: https://issues.apache.org/jira/browse/PHOENIX-1267
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Abhishek Singh Chouhan
>Priority: Major
>  Labels: SFDC, newbie
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-1267.master.patch, PHOENIX-1267.master.patch, 
> smallscan.patch, smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a 
> scan as "small". This prevents extra RPC calls, I believe. We should add a 
> hint for queries that forces it to be set/not set, and make our best guess on 
> when it should default to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2018-03-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393885#comment-16393885
 ] 

Hudson commented on PHOENIX-1267:
-

FAILURE: Integrated in Jenkins build Phoenix-4.x-HBase-1.3 #55 (See 
[https://builds.apache.org/job/Phoenix-4.x-HBase-1.3/55/])
PHOENIX-1267 Set scan.setSmall(true) when appropriate (Abhishek Singh (jtaylor: 
rev 1cf0744024056828f42336a0fb92f0f3bff56961)
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServices.java
* (edit) 
phoenix-core/src/test/java/org/apache/phoenix/compile/QueryCompilerTest.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/execute/BaseQueryPlan.java


> Set scan.setSmall(true) when appropriate
> 
>
> Key: PHOENIX-1267
> URL: https://issues.apache.org/jira/browse/PHOENIX-1267
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Abhishek Singh Chouhan
>Priority: Major
>  Labels: SFDC, newbie
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-1267.master.patch, PHOENIX-1267.master.patch, 
> smallscan.patch, smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a 
> scan as "small". This prevents extra RPC calls, I believe. We should add a 
> hint for queries that forces it to be set/not set, and make our best guess on 
> when it should default to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2018-03-07 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390851#comment-16390851
 ] 

Abhishek Singh Chouhan commented on PHOENIX-1267:
-

Have attached the rebased patch. Thanks for review [~jamestaylor]!

> Set scan.setSmall(true) when appropriate
> 
>
> Key: PHOENIX-1267
> URL: https://issues.apache.org/jira/browse/PHOENIX-1267
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Abhishek Singh Chouhan
>Priority: Major
>  Labels: SFDC, newbie
> Fix For: 4.14.0
>
> Attachments: PHOENIX-1267.master.patch, PHOENIX-1267.master.patch, 
> smallscan.patch, smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a 
> scan as "small". This prevents extra RPC calls, I believe. We should add a 
> hint for queries that forces it to be set/not set, and make our best guess on 
> when it should default to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2018-03-07 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389861#comment-16389861
 ] 

James Taylor commented on PHOENIX-1267:
---

+1 on the patch. Nice work, [~abhishek.chouhan]. Would you mind attaching a 
rebased version of the patch as it's no longer applying cleanly? Then I'll get 
it committed on your behalf.

> Set scan.setSmall(true) when appropriate
> 
>
> Key: PHOENIX-1267
> URL: https://issues.apache.org/jira/browse/PHOENIX-1267
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Abhishek Singh Chouhan
>Priority: Major
>  Labels: SFDC, newbie
> Fix For: 4.14.0
>
> Attachments: PHOENIX-1267.master.patch, smallscan.patch, 
> smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a 
> scan as "small". This prevents extra RPC calls, I believe. We should add a 
> hint for queries that forces it to be set/not set, and make our best guess on 
> when it should default to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2018-02-19 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369273#comment-16369273
 ] 

James Taylor commented on PHOENIX-1267:
---

Yes, this sounds good, [~abhishek.chouhan]. I think the tricky part is figuring 
out what the threshold should be.

> Set scan.setSmall(true) when appropriate
> 
>
> Key: PHOENIX-1267
> URL: https://issues.apache.org/jira/browse/PHOENIX-1267
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Abhishek Singh Chouhan
>Priority: Major
>  Labels: SFDC, newbie
> Fix For: 4.14.0
>
> Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a 
> scan as "small". This prevents extra RPC calls, I believe. We should add a 
> hint for queries that forces it to be set/not set, and make our best guess on 
> when it should default to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2018-02-19 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369141#comment-16369141
 ] 

Abhishek Singh Chouhan commented on PHOENIX-1267:
-

Apologies for the delay, got a bit busy.
 * Was thinking of having an attributeĀ as suggested above, something like 
"phoenix.query.smallScanThreshold" in QueryServices and its default of 100 in 
QueryServicesOption
 * In BaseQueryPlan.iterator(...) where we check for SMALL hint, also check for 
pointlookup and its count (since we would already have the calculated 
scanranges by this time in context), something like

{code:java}
if (statement.getHint().hasHint(Hint.SMALL) || (scanRanges.isPointLookup() && 
scanRanges.getPointLookupCount() < smallScanThreshold)) {
scan.setSmall(true);
}{code}
[~jamestaylor] [~vincentpoon] Please let me know if this sounds okay and i'll 
put up a patch. Thanks!!

> Set scan.setSmall(true) when appropriate
> 
>
> Key: PHOENIX-1267
> URL: https://issues.apache.org/jira/browse/PHOENIX-1267
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Abhishek Singh Chouhan
>Priority: Major
>  Labels: SFDC, newbie
> Fix For: 4.14.0
>
> Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a 
> scan as "small". This prevents extra RPC calls, I believe. We should add a 
> hint for queries that forces it to be set/not set, and make our best guess on 
> when it should default to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2018-02-09 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359053#comment-16359053
 ] 

James Taylor commented on PHOENIX-1267:
---

Yes, that'd be great, [~abhishek.chouhan]. I've assigned the issue to you.

> Set scan.setSmall(true) when appropriate
> 
>
> Key: PHOENIX-1267
> URL: https://issues.apache.org/jira/browse/PHOENIX-1267
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Abhishek Singh Chouhan
>Priority: Major
>  Labels: SFDC, newbie
> Fix For: 4.14.0
>
> Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch
>
>
> There's a nice optimization that has been in HBase for a while now to set a 
> scan as "small". This prevents extra RPC calls, I believe. We should add a 
> hint for queries that forces it to be set/not set, and make our best guess on 
> when it should default to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-09 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166079#comment-14166079
 ] 

James Taylor commented on PHOENIX-1267:
---

You lost me at pread :-)

I agree that we should at a minimum provide the hints, as that's easy.

So using small scan is based on how much data you're reading, right? Not how 
much data you're returning from the server? Or is it both?

In all cases in Phoenix, each scan will be within a single region and will scan 
over at most a configurable number of bytes (i.e. determined by the guidepost 
depth stats config). I'm going to call that a *segment*. Does that help us, in 
that we know in advance how many segments we're scanning over? 

Let me throw some Phoenix situations at you and if you have a chance, tell me 
if you think they'd benefit from using small scan:
- a point lookup. We know how many keys we're looking for in advance and they 
are complete row keys. Would using/not using small scan depend on how many 
point keys we're looking for? Or on how many segments we're looking for them in?
- an ungrouped aggregation (i.e. it'll return a single row, but potentially 
scan lots of rows).
- a grouped aggregation.
- a scan where we know we're only scanning  N rows. This is the 
ChunkedResultIterator case, where we run the scan until a limit, and then run 
it again, starting from where we left off. It's also the case where we have a 
LIMIT on a non aggregate scan without an ORDER BY.
- an ordered scan. We sort on the RS side and then merge sort on the client. 
The number of rows returned depends on the WHERE clause.
- any query knowing that it'll only traverse N guideposts segments (we know 
this in advance), where N is the guidepost depth (maybe 1/10 of the region).

Thanks, [~lhofhansl].

 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166116#comment-14166116
 ] 

Lars Hofhansl commented on PHOENIX-1267:


A small scan does two things:
# avoids prefetch - if we're only reading a few K of data, prefetching many 
megabytes (I think 2mb by default) at the data nodes is a waste (it does that 
by doing an HDFS positional read or pread)
# avoids one RPC - you do not need to close the scanner via an extra RPC. If 
the scan takes a short amount of time the extra RPC can be significant.

I didn't get the point lookup case. Are those different from HBase Gets? Or 
you're talking about skip_scan traversing a bunch of points that are known 
ahead of time? I'm assuming you mean a skip_scan here.

The only reliable case seems to be a scan where we know we're only scanning  
N rows, where we can reliably predict that data scanned is in the ballpark of 
64k maybe to 1mb or so.
For the only traverse N guideposts segments only if the guideposts distances 
are known to be small (again in the 64k-1mb range).


 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-09 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166128#comment-14166128
 ] 

James Taylor commented on PHOENIX-1267:
---

Our guideposts aren't that granular - they're more in the  range of 1/10 of the 
region size, so perhaps 500MB - 1GB by default (am I doing my math right?). So 
I guess we shouldn't decide based on that.

Use our point lookup is a skip scan. We essentially hop from row to row using 
SEEK_NEXT_HINT to get there. Would this be a good candidate for using small 
scan? Would it depend on how many seeks we're doing? We could figure that out 
in advance.

So that leaves us with the scan case. We can only know we'll scan  N rows if 
we have a LIMIT or if we'll go through our ChunkedResultIterator. So those two 
cases seem good to use a small scan. We can turn small scan off for the 
ChunkedResulterator case after we hit the end of the first batch (3000 rows by 
default).

 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166226#comment-14166226
 ] 

Lars Hofhansl commented on PHOENIX-1267:


Thinking about skip-scans now... Might be a good candidate too - assuming that 
we're likely skip out of the 2mb we prefetched with just a few skips. Hard to 
say when that will be the case. I'd say do the small scan for skip-scans too.

 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-08 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163189#comment-14163189
 ] 

James Taylor commented on PHOENIX-1267:
---

Thanks for the patch, [~jaywong]. Please encapsulate this complete behind the 
implementors of QueryPlan.  More specifically:
- Add a protected method called isSmallScan in BaseQueryPlan.
- Override this method in ScanPlan (which is the plan for non aggregate 
queries) with the part of your logic from isScanForbidden. You'd want to check 
if order by is not empty here - I don't think topN is the correct criteria.
- Also override the method in AggregatePlan. I think you always want to return 
false here.
- I don't think you need to do anything special for joins.
- In the BaseQueryPlan.iterators() method you can call scan.setScanSmall(). 
First call isSmallScan(). Then call  
statement.getHint().hasHint(Hint.SMALL_SCAN) and 
statement.getHint().hasHint(Hint.NO_SMALL_SCAN) to override the value you got 
from isSmallScan(). Based on this logic, you'll have the correct value to use 
for scan.setScanSmall().

 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-08 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163236#comment-14163236
 ] 

jay wong commented on PHOENIX-1267:
---

[~jamestaylor]
About isScanForbidden logic is not about topN, it's about order by. But i found 
that the order by query sign for TOPN.
eg: we have a result like. assume that small scan query 100 results one query.
|key|col1|
|1|2|
|2|1|
|...|...|
|100|55|
|2|101|

when next query. it found the row which is rowkey is greater than the last 
rowkey in the last page.

so the rowkey will be fall into a  Infinite loop.



 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-08 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163246#comment-14163246
 ] 

jay wong commented on PHOENIX-1267:
---

[~jamestaylor]
About joins. I think that the joins query is always a big query almost the 
whole table.

about all big query. the small scan has a bad performance. So I set smallScan 
false before

 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-08 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164779#comment-14164779
 ] 

James Taylor commented on PHOENIX-1267:
---

Thinking about this more, I think we can have small scan set to true most of 
the time. Our scans are parallelized, so they always target only part of a 
region. So immediately in QueryCompiler.compileSingleQuery(), call 
setSmallScan(true) unless there's a NO_SMALL_SCAN hint.

Then, in the following cases, we'd turn it off:
- if there's no where clause and no limit. Probably easiest to determine this 
in QueryCompiler.compileSingleQuery().
- if a second chunk of data is returned from a parallel scan. This can be set 
in ChunkedResultIterator.getResultIterator().
- If there's an order by and we're traversing over a large number of segments 
(based on a new config parameter). The order by doesn't go through 
ChunkedResultIterator, so we don't have a good way of turning the option back 
off. You can determine how much data the scan will traverse by looking at 
splits.size(). This is an estimation of how many 30MB chunks 
(phoenix.stats.guidepost.width) of data that will be traversed by the scan.

Then in BaseQueryIterator.iterators(), we call scan.setSmallScan(true) if the 
SMALL_SCAN is used which would override the above logic.

Thoughts? [~lhofhansl] 

 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164803#comment-14164803
 ] 

Lars Hofhansl commented on PHOENIX-1267:


Small scan also force pread as opposed to seek+read. seek+read is cheaper and 
also does prefetching at the datanodes, but only one scanner can use it per 
reader (i.e. HFile). See HBASE-7336.

This is a touch to predict when to use it.

If the individual scans are so small that prefetching has no benefit or that 
two RPCs would be significant as opposed to one RPC then a small scan makes 
sense. In the doc is says Generally, if the scan range is within one data 
block(64KB), it could be considered as a small scan.

Starting with a small scan and then switching cool. If we scan multiple HFile 
blocks and these blocks are not in the HBase blockcache seek+read has the 
potential of being much faster. But we'd only eat that cost for one scan in the 
beginning.

Might be best to start with the hint only (and default to false) and perf test 
a variety of scenarios.


 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch, smallscan3.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-07 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163003#comment-14163003
 ] 

jay wong commented on PHOENIX-1267:
---

I have a holiday in the past several days. so sorry for reply later.

I know your mean. normally the hint is more structured and a better way. I 
think use hint control the small is a good point.

the small scan will be set true default when both the startkey and stopkey is 
setted. if we have a order by query. and the small is true. the result will be 
Infinite loop.

So I think the small scan is not only a query optimize for user. I will cause a 
bug. 
So I think the smallScanForbidden is needed also.






 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)