Yes. Please file an issue. A few fellas are messing with block cache at the moment so they might be up for taking a detour to figure the why on your interesting observation.
Thanks, St.Ack On Thu, Jul 14, 2011 at 8:41 PM, Mingjian Deng <[email protected]> wrote: > Hi stack: > Server A or B is the same in the cluster. If I set > hfile.block.cache.size=0.1 > on other server, the problem will reappear.But When I set > hfile.block.cache.size = 0.15 or more, it won't reappear. So I think you can > test on your own cluster. > With the follow btrace codes: > -------------------------------------------------------------- > import static com.sun.btrace.BTraceUtils.*; > import com.sun.btrace.annotations.*; > > import java.nio.ByteBuffer; > import org.apache.hadoop.hbase.io.hfile.*; > > @BTrace public class TestRegion1{ > @OnMethod( > clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader", > method="decompress" > ) > public static void traceCacheBlock(final long offset, final int > compressedSize, > final int decompressedSize, final boolean pread){ > println(strcat("decompress: ",str(decompressedSize))); > } > } > -------------------------------------------------------------- > > If I set hfile.block.cache.size=0.1, the result is: > ----------- > ....... > decompress: 6020488 > decompress: 6022536 > decompress: 5991304 > decompress: 6283272 > decompress: 5957896 > decompress: 6246280 > decompress: 6041096 > decompress: 6541448 > decompress: 6039560 > ....... > ----------- > If I set hfile.block.cache.size=0.12, the result is: > ----------- > ...... > decompress: 65775 > decompress: 65556 > decompress: 65552 > decompress: 9914120 > decompress: 6026888 > decompress: 65615 > decompress: 65627 > decompress: 6247944 > decompress: 5880840 > decompress: 65646 > ...... > ----------- > If I set hfile.block.cache.size=0.15 or more, the result is: > ----------- > ...... > decompress: 65646 > decompress: 65615 > decompress: 65627 > decompress: 65775 > decompress: 65556 > decompress: 65552 > decompress: 65646 > decompress: 65615 > decompress: 65627 > decompress: 65775 > decompress: 65556 > decompress: 65552 > ...... > ----------- > > All of above tests run more than 10 minutes in high level read speed. So > it is very strange phenomenon. > > 2011/7/15 Stack <[email protected]> > >> This is interesting. Any chance that the cells on the regions hosted >> on server A are 5M in size? >> >> The hfile block sizes are by default configured to be 64k but rare >> would an hfile block ever be exactly 64k. We do not cut the hfile >> block content at 64k exactly. The hfile block boundary will be at a >> keyvalue boundary. >> >> If a cell were 5MB, it does not get split across multiple hfile >> blocks. It will occupy one hfile block. >> >> Could it be that the region hosted on A is not like the others and it >> has lots of these 5MB sizes? >> >> Let us know. If above is not the case, then you have an interesting >> phenomenon going on and we need to dig in more. >> >> St.Ack >> >> >> On Thu, Jul 14, 2011 at 5:27 AM, Mingjian Deng <[email protected]> >> wrote: >> > Hi: >> > we found a strange problem in our read test. >> > It is a 5 nodes cluster.Four of our 5 regionservers >> > set hfile.block.cache.size=0.4, one of them is 0.1(node A). When we >> random >> > read from a 2TB data table we found node A's network reached 100MB, and >> > others are less than 10MB. We kown node A need to read data from disks >> and >> > put them in blockcache. In the follow codes in LruBlockCache: >> > >> -------------------------------------------------------------------------------------------------------------------------- >> > public void cacheBlock(String blockName, ByteBuffer buf, boolean >> inMemory) >> > { >> > CachedBlock cb = map.get(blockName); >> > if(cb != null) { >> > throw new RuntimeException("Cached an already cached block"); >> > } >> > cb = new CachedBlock(blockName, buf, count.incrementAndGet(), >> inMemory); >> > long newSize = size.addAndGet(cb.heapSize()); >> > map.put(blockName, cb); >> > elements.incrementAndGet(); >> > if(newSize > acceptableSize() && !evictionInProgress) { >> > runEviction(); >> > } >> > } >> > >> -------------------------------------------------------------------------------------------------------------------------- >> > >> > >> > >> > >> > We debugged this code with btrace like follow code: >> > >> -------------------------------------------------------------------------------------------------------------------------- >> > import static com.sun.btrace.BTraceUtils.*; >> > import com.sun.btrace.annotations.*; >> > >> > import java.nio.ByteBuffer; >> > import org.apache.hadoop.hbase.io.hfile.*; >> > >> > @BTrace public class TestRegion{ >> > @OnMethod( >> > clazz="org.apache.hadoop.hbase.io.hfile.LruBlockCache", >> > method="cacheBlock" >> > ) >> > public static void traceCacheBlock(@Self LruBlockCache instance,String >> > blockName, ByteBuffer buf, boolean inMemory){ >> > println(strcat("size: >> > >> ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","size"),instance)))); >> > println(strcat("elements: >> > >> ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","elements"),instance)))); >> > } >> > } >> > >> -------------------------------------------------------------------------------------------------------------------------- >> > >> > >> > >> > We found that the "size" increace 5 MB each time in node A! Why not 64 >> KB >> > each time?? But the "size" increace 64 KB when we run this btrace code in >> > other nodes at the same time. >> > >> > The follow codes also confirm the problem because the "decompressedSize" >> > is 5 MB each time in node A! >> > >> ------------------------------------------------------------------------------------------------------------------------- >> > import static com.sun.btrace.BTraceUtils.*; >> > import com.sun.btrace.annotations.*; >> > >> > import java.nio.ByteBuffer; >> > import org.apache.hadoop.hbase.io.hfile.*; >> > >> > @BTrace public class TestRegion1{ >> > @OnMethod( >> > clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader", >> > method="decompress" >> > ) >> > public static void traceCacheBlock(final long offset, final int >> > compressedSize, >> > final int decompressedSize, final boolean pread){ >> > println(strcat("decompressedSize: ",str(decompressedSize))); >> > } >> > } >> > >> ------------------------------------------------------------------------------------------------------------------------- >> > >> > >> > >> > Why not 64 KB? >> > >> > BTW: When we set hfile.block.cache.size=0.4 in node A, the >> > "decompressedSize" down to 64 KB, and the tps is up to high level. >> > >> >
