Ok. I found a clean way to improve that a lot without going with the
filter. I will open a JIRA and push a fix.
The idea is to set the caching to the maximum of LIMIT, so we don't read
the entire table before returning to the shell. Also, we have to change
where we do the test.
anyway. JIRA
Oh, I see! So basically we do a full table scan because it never returns a
2nd row, so we never reach that break and we exit only when we reach the
end of the table. Therefore the same performances without the limit
parameter...
Should we then try to add a filter like PageFilter to the scan if we
I tried to run scan/get/scan/get many times, and always the same pattern.
You can remove the LIMIT = 1 parameter and you will get the same
performances.
Scan and get without the QC returns in very similar time. 191ms for one,
194ms for the other one.
2015-05-19 23:02 GMT-04:00 Ted Yu
For PageFilter :
* Implementation of Filter interface that limits results to a specific page
* size. It terminates scanning once the number of filter-passed rows is
* the given page size.
In your case, what should be the page size - 0 ?
Cheers
On Tue, May 19, 2015 at 8:30 PM, Jean-Marc
Are not Scan and Gets supposed to be almost as fast?
I have a pretty small table with 65K lines, few columns (hundred?) trying
to go a get and a scan.
hbase(main):009:0 scan 'sensors', { COLUMNS =
['v:f92acb5b-079a-42bc-913a-657f270a3dc1'], STARTROW = '000a', LIMIT = 1 }
ROW
COLUMN+CELL
000a
C’mon, really?
Do they really return the same results?
Let me put it this way… are you walking through the same code path?
On May 19, 2015, at 10:34 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org
wrote:
Are not Scan and Gets supposed to be almost as fast?
I have a pretty small
J-M:
How many times did you try the pair of queries ?
Since scan was run first, this would give the get query some advantage,
right ?
Cheers
On Tue, May 19, 2015 at 7:34 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
Are not Scan and Gets supposed to be almost as fast?
I have a
Take a look at table.rb _scan_internal()
LIMIT is not passed to the server, so you fetch more rows
https://github.com/apache/hbase/blob/master/hbase-shell/src/main/ruby/hbase/table.rb#L495
Matteo
On Tue, May 19, 2015 at 8:11 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
I tried to
Thanks Ian! Very helpful breakdown.
For this use case, I think the multi-version row structure is ruled out.
We will investigate the onekey-manycolumn approach. Also, the more I study
the mechanics behind a SCAN vs GET, the more I believe the informal test I
did is inaccurate. What does
someone
could describe the performance trade-off between Scan vs. Get.
Thanks again for anyone who read this far.
Neil Yalowitz
neilyalow...@gmail.com
On Wed, Oct 17, 2012 at 10:45 AM, Michael Segel
michael_se...@hotmail.comwrote:
Neil,
Since you asked
Actually your
...@gmail.commailto:neilyalow...@gmail.com
Date: Tue, Oct 16, 2012 at 2:53 PM
Subject: crafting your key - scan vs. get
To: user@hbase.apache.orgmailto:user@hbase.apache.org
Hopefully this is a fun question. :)
Assume you could architect an HBase table from scratch and you were
choosing between the following two
, are there any performance considerations between Scan vs. Get in this
use case? Which choice would you go for?
Neil Yalowitz
neilyalow...@gmail.com
need to do a scan.
This is the core of my original question. My anecdotal tests in hbase
shell showed a Get executing about 3x faster than a Scan with
start/stoprow, but I don't trust my crude testing much and hoped someone
could describe the performance trade-off between Scan vs. Get.
Thanks
/schema.versions.html
So, are there any performance considerations between Scan vs. Get in this
use case? Which choice would you go for?
Neil Yalowitz
neilyalow...@gmail.com
14 matches
Mail list logo