What is the max version setting u have done for ur table cf? When u set some a value, HBase has to keep all those versions. During a scan it will read all those versions. In 94 version the default value for the max versions is 3. I guess you have set some bigger value. If u have not, mind testing after a major compaction?
-Anoop- On Fri, Apr 11, 2014 at 1:01 PM, gortiz <gor...@pragsis.com> wrote: > Last test I have done it's to reduce the number of versions to 100. > So, right now, I have 100 rows with 100 versions each one. > Times are: (I got the same times for blocksize of 64Ks and 1Mb) > 100row-1000versions + blockcache-> 80s. > 100row-1000versions + No blockcache-> 70s. > > 100row-*100*versions + blockcache-> 7.3s. > 100row-*100*versions + No blockcache-> 6.1s. > > What's the reasons of this? I guess HBase is enough smart for not consider > old versions, so, it just checks the newest. But, I reduce 10 times the > size (in versions) and I got a 10x of performance. > > The filter is scan 'filters', {FILTER => "ValueFilter(=, > 'binary:5')",STARTROW => '1010000000000000000000000000000000000101', > STOPROW => '6010000000000000000000000000000000000201'} > > > > On 11/04/14 09:04, gortiz wrote: > >> Well, I guessed that, what it doesn't make too much sense because it's so >> slow. I only have right now 100 rows with 1000 versions each row. >> I have checked the size of the dataset and each row is about 700Kbytes >> (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x >> 700Kbytes = 70Mb, since it just check the newest version. How can it spend >> too many time checking this quantity of data? >> >> I'm generating again the dataset with a bigger blocksize (previously was >> 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and >> baching parameters, but I don't think they're going to affect too much. >> >> Another test I want to do, it's generate the same dataset with just >> 100versions, It should spend around the same time, right? Or am I wrong? >> >> On 10/04/14 18:08, Ted Yu wrote: >> >>> It should be newest version of each value. >>> >>> Cheers >>> >>> >>> On Thu, Apr 10, 2014 at 9:55 AM, gortiz <gor...@pragsis.com> wrote: >>> >>> Another little question is, when the filter I'm using, Do I check all the >>>> versions? or just the newest? Because, I'm wondering if when I do a scan >>>> over all the table, I look for the value "5" in all the dataset or I'm >>>> just >>>> looking for in one newest version of each value. >>>> >>>> >>>> On 10/04/14 16:52, gortiz wrote: >>>> >>>> I was trying to check the behaviour of HBase. The cluster is a group of >>>>> old computers, one master, five slaves, each one with 2Gb, so, 12gb in >>>>> total. >>>>> The table has a column family with 1000 columns and each column with >>>>> 100 >>>>> versions. >>>>> There's another column faimily with four columns an one image of 100kb. >>>>> (I've tried without this column family as well.) >>>>> The table is partitioned manually in all the slaves, so data are >>>>> balanced >>>>> in the cluster. >>>>> >>>>> I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=, >>>>> 'binary:5')"* in HBase 0.94.6 >>>>> My time for lease and rpc is three minutes. >>>>> Since, it's a full scan of the table, I have been playing with the >>>>> BLOCKCACHE as well (just disable and enable, not about the size of >>>>> it). I >>>>> thought that it was going to have too much calls to the GC. I'm not >>>>> sure >>>>> about this point. >>>>> >>>>> I know that it's not the best way to use HBase, it's just a test. I >>>>> think >>>>> that it's not working because the hardware isn't enough, although, I >>>>> would >>>>> like to try some kind of tunning to improve it. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 10/04/14 14:21, Ted Yu wrote: >>>>> >>>>> Can you give us a bit more information: >>>>>> >>>>>> HBase release you're running >>>>>> What filters are used for the scan >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Apr 10, 2014, at 2:36 AM, gortiz <gor...@pragsis.com> wrote: >>>>>> >>>>>> I got this error when I execute a full scan with filters about a >>>>>> table. >>>>>> >>>>>>> Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase. >>>>>>> regionserver.LeaseException: >>>>>>> org.apache.hadoop.hbase.regionserver.LeaseException: lease >>>>>>> '-4165751462641113359' does not exist >>>>>>> at >>>>>>> org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) >>>>>>> >>>>>>> >>>>>>> at org.apache.hadoop.hbase.regionserver.HRegionServer. >>>>>>> next(HRegionServer.java:2482) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke( >>>>>>> NativeMethodAccessorImpl.java:39) >>>>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke( >>>>>>> DelegatingMethodAccessorImpl.java:25) >>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>>> at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call( >>>>>>> WritableRpcEngine.java:320) >>>>>>> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run( >>>>>>> HBaseServer.java:1428) >>>>>>> >>>>>>> I have read about increase the lease time and rpc time, but it's not >>>>>>> working.. what else could I try?? The table isn't too big. I have >>>>>>> been >>>>>>> checking the logs from GC, HMaster and some RegionServers and I >>>>>>> didn't see >>>>>>> anything weird. I tried as well to try with a couple of caching >>>>>>> values. >>>>>>> >>>>>>> >>>>> -- >>>> *Guillermo Ortiz* >>>> /Big Data Developer/ >>>> >>>> Telf.: +34 917 680 >>>> 490<https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html#> >>>> Fax: +34 913 833 >>>> 301<https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html#> >>>> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain >>>> >>>> _http://www.bidoop.es_ >>>> >>>> >>>> >> > > -- > *Guillermo Ortiz* > /Big Data Developer/ > > Telf.: +34 917 680 > 490<https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html#> > Fax: +34 913 833 > 301<https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html#> > C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain > > _http://www.bidoop.es_ > >