Hi Eugeny, The mailing list stripped your attachement (as it often does) so you might want to put it on a public web server.
I don't have much to contribute except than to point to a recent conversation that you can find here: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28722 Hope this helps, J-D On Fri, Sep 21, 2012 at 5:03 AM, Eugeny Morozov <[email protected]> wrote: > Hello! > > It is known and I saw it in the code that time range set by > scan.setTimeRange is used to filter out HFiles for further scan. > Which means that speed of following scanner.next must be almost zero in case > if I set time range far away in future. I am sure that I do not have HFiles > that fall into the set time range period. > > But - and here is the question - surprisingly scanning with set time range > is far longer than without it. > > My results are following: > Use range [false]. Time spent (avg): [0] > Use range [true]. Time spent (avg): [525] > > There are KeyValues listed, when time range is not used. > > The code is following: > public static void run(boolean useRange, HTable table) throws Exception > { > Scan scan = new Scan().addFamily( family ).setCaching( -1 > ).setCacheBlocks( false ); > scan.setStartRow( random start row ); > if (useRange) scan.setTimeRange(1348114401600L, 1348114401700L); > > ResultScanner scanner = table.getScanner(scan); > for(int i = 0 ; i < N; i++) { // There were bunch of measures, where > N was from 10 to 50 > long time = System.currentTimeMillis(); > result = scanner.next(); > sum += (System.currentTimeMillis() - time) / N; > } > } > > Of course such a measurements are include all sort of noise like network > overhead, etc, but I'm using virtual machine on my own box, and at the time > I do measurement there is no other activity neither on my own box or this > virtual machine, so such a noise is minimum. > > Also I've used YourKit to measure tracing and sampling of running > HRegionServer, but didn't found anything suspicious. Though I didn't look at > heap and GC perf. Tracing is in attach. > > So, the question is why is it so slow when time range is set and so fast > without it? > -- > Evgeny Morozov > Developer Grid Dynamics > Skype: morozov.evgeny > www.griddynamics.com > [email protected]
