Correct me if I'm wrong, but isn't hbase's default block size 256MB while hadoop's default blocksize is 64MB?
> From: [email protected] > To: [email protected] > Subject: Re: scan performance improvement > Date: Thu, 11 Nov 2010 13:08:56 +0000 > > Not that block size (that's the HDFS one), but the HBase block size. You set > it at table creation or it uses the default of 64K. > > The description of hbase.client.scanner.caching says: > Number of rows that will be fetched when calling next > on a scanner if it is not served from memory. Higher caching values > will enable faster scanners but will eat up more memory and some > calls of next may take longer and longer times when the cache is empty. > > That means that it will pre-fetch that number of rows, if the next row does > not come from memory. So if your rows are small enough to fit 100 of them in > one block, it doesn't matter whether you pre-fetch 1, 50 or 99, because it > will only go to disk when it exhausts the whole block, which sticks in block > cache. So, it will still fetch the same amount of data from disk every time. > If you increase the number to a value that is certain to load multiple blocks > at a time from disk, it will increase performance. > > > > On 11 nov 2010, at 12:55, Oleg Ruchovets wrote: > > > Yes , I thought about large number , so you said it depends on block size. > > Good point. > > > > I have one recored ~ 4k , > > block size is: > > > > <property> > > <name>dfs.block.size</name> > > <value>268435456</value> > > <description>HDFS blocksize of 256MB for large file-systems. > > </description> > > </property> > > > > what is the number that I have choose? Assuming > > I am afraid that using number which is equal one block brings to > > socketTimeOutException? Am I write? > > > > Thanks Oleg. > > > > > > > > > > On Thu, Nov 11, 2010 at 1:30 PM, Friso van Vollenhoven < > > [email protected]> wrote: > > > >> How small is small? If it is bytes, then setting the value to 50 is not so > >> much different from 1, I suppose. If 50 rows fit in one block, it will just > >> fetch one block whether the setting is 1 or 50. You might want to try a > >> larger value. It should be fine if the records are small and you need them > >> all on the client side anyway. > >> > >> It also depends on the block size, of course. When you only ever do full > >> scans on a table and little random access, you might want to increase that. > >> > >> Friso > >> > >> > >> > >> > >> On 11 nov 2010, at 12:15, Oleg Ruchovets wrote: > >> > >>> Hi , > >>> To improve client performance I changed > >>> hbase.client.scanner.caching from 1 to 50. > >>> After running client with new value( hbase.client.scanner.caching from = > >> 50 > >>> ) it didn't improve execution time at all. > >>> > >>> I have ~ 9 million small records. > >>> I have to do full scan , so it brings all 9 million records to client . > >>> My assumption -- this change have to bring significant improvement , but > >> it > >>> is not. > >>> > >>> Additional Information. > >>> I scan table which has 100 regions > >>> 5 server > >>> 20 map > >>> 4 concurrent map > >>> Scan process takes 5.5 - 6 hours. As for me it is too much time? Am I > >> write? > >>> and how can I improve it > >>> > >>> > >>> I changed the value in all hbase-site.xml files and restart hbase. > >>> > >>> Any suggestions. > >> > >> >
