Re: Scan vs Get

2015-05-20 Thread Jean-Marc Spaggiari
Ok. I found a clean way to improve that a lot without going with the filter. I will open a JIRA and push a fix. The idea is to set the caching to the maximum of LIMIT, so we don't read the entire table before returning to the shell. Also, we have to change where we do the test. anyway. JIRA

Re: Scan vs Get

2015-05-19 Thread Jean-Marc Spaggiari
Oh, I see! So basically we do a full table scan because it never returns a 2nd row, so we never reach that break and we exit only when we reach the end of the table. Therefore the same performances without the limit parameter... Should we then try to add a filter like PageFilter to the scan if we

Re: Scan vs Get

2015-05-19 Thread Jean-Marc Spaggiari
I tried to run scan/get/scan/get many times, and always the same pattern. You can remove the LIMIT = 1 parameter and you will get the same performances. Scan and get without the QC returns in very similar time. 191ms for one, 194ms for the other one. 2015-05-19 23:02 GMT-04:00 Ted Yu

Re: Scan vs Get

2015-05-19 Thread Ted Yu
For PageFilter : * Implementation of Filter interface that limits results to a specific page * size. It terminates scanning once the number of filter-passed rows is * the given page size. In your case, what should be the page size - 0 ? Cheers On Tue, May 19, 2015 at 8:30 PM, Jean-Marc

Scan vs Get

2015-05-19 Thread Jean-Marc Spaggiari
Are not Scan and Gets supposed to be almost as fast? I have a pretty small table with 65K lines, few columns (hundred?) trying to go a get and a scan. hbase(main):009:0 scan 'sensors', { COLUMNS = ['v:f92acb5b-079a-42bc-913a-657f270a3dc1'], STARTROW = '000a', LIMIT = 1 } ROW COLUMN+CELL 000a

Re: Scan vs Get

2015-05-19 Thread Michael Segel
C’mon, really? Do they really return the same results? Let me put it this way… are you walking through the same code path? On May 19, 2015, at 10:34 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Are not Scan and Gets supposed to be almost as fast? I have a pretty small

Re: Scan vs Get

2015-05-19 Thread Ted Yu
J-M: How many times did you try the pair of queries ? Since scan was run first, this would give the get query some advantage, right ? Cheers On Tue, May 19, 2015 at 7:34 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Are not Scan and Gets supposed to be almost as fast? I have a

Re: Scan vs Get

2015-05-19 Thread Matteo Bertozzi
Take a look at table.rb _scan_internal() LIMIT is not passed to the server, so you fetch more rows https://github.com/apache/hbase/blob/master/hbase-shell/src/main/ruby/hbase/table.rb#L495 Matteo On Tue, May 19, 2015 at 8:11 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: I tried to

Re: crafting your key - scan vs. get

2012-10-19 Thread Neil Yalowitz
Thanks Ian! Very helpful breakdown. For this use case, I think the multi-version row structure is ruled out. We will investigate the onekey-manycolumn approach. Also, the more I study the mechanics behind a SCAN vs GET, the more I believe the informal test I did is inaccurate. What does

Re: crafting your key - scan vs. get

2012-10-18 Thread Michael Segel
someone could describe the performance trade-off between Scan vs. Get. Thanks again for anyone who read this far. Neil Yalowitz neilyalow...@gmail.com On Wed, Oct 17, 2012 at 10:45 AM, Michael Segel michael_se...@hotmail.comwrote: Neil, Since you asked Actually your

Re: crafting your key - scan vs. get

2012-10-18 Thread Ian Varley
...@gmail.commailto:neilyalow...@gmail.com Date: Tue, Oct 16, 2012 at 2:53 PM Subject: crafting your key - scan vs. get To: user@hbase.apache.orgmailto:user@hbase.apache.org Hopefully this is a fun question. :) Assume you could architect an HBase table from scratch and you were choosing between the following two

Re: crafting your key - scan vs. get

2012-10-17 Thread Michael Segel
, are there any performance considerations between Scan vs. Get in this use case? Which choice would you go for? Neil Yalowitz neilyalow...@gmail.com

Re: crafting your key - scan vs. get

2012-10-17 Thread Neil Yalowitz
need to do a scan. This is the core of my original question. My anecdotal tests in hbase shell showed a Get executing about 3x faster than a Scan with start/stoprow, but I don't trust my crude testing much and hoped someone could describe the performance trade-off between Scan vs. Get. Thanks

crafting your key - scan vs. get

2012-10-16 Thread Neil Yalowitz
/schema.versions.html So, are there any performance considerations between Scan vs. Get in this use case? Which choice would you go for? Neil Yalowitz neilyalow...@gmail.com