For suggestion #3 below, take a look at: HBASE-7509 Enable RS to query a secondary datanode in parallel, if the primary takes too long
Cheers On Mon, Jul 8, 2013 at 3:04 AM, Viral Bajaria <[email protected]>wrote: > Hi, > > TL;DR; > Trying to make a case for making the block eviction strategy smart and to > not evict remote blocks more frequently and make the requests more smarter. > > The question here comes after I debugged the issue that I was having with > random region servers hitting high load averages. I initially thought the > problem was hardware related i.e. bad disk or network since the wait I/O > was too high but it was a combination of things. > > I figured with SCR (short circuit read) ON the datanode should almost never > show high amount of block requests from the local regionservers. So my > starting point for debugging was the datanode since it was doing a ton of > I/O. The clienttrace logs helped me figure out which RS nodes were making > block requests. I hacked up a script to report which blocks are being > requested and how many times per minute. I found that some blocks were > being requested 10+ times in a minute and over 2000 times in an hour from > the same regionserver. This was causing the server to do 40+MB/s on reads > alone. That was on the higher side, the average was closer to 100 or so per > hour. > > Now why did I end up in such a situation. It happened due to the fact that > I added servers to the cluster and rebalanced the cluster. At the same time > I added some drives and also removed the offending server in my setup. This > caused some of the data to not be co-located with the regionservers. Given > that major_compaction was disabled and it would not have run for a while > (atleast on some tables) these block requests would not go away. One of my > regionservers was totally overwhelmed. I made the situation worse when I > removed the server that was under heavy load with the assumption that it's > a hardware problem with the box without doing a deep dive (doh!). Given > that regionservers will be added in the future, I expect block locality to > go down till major_compaction runs. Also nodes can go down and cause this > problem. So I started thinking of probable solutions, but first some > observations. > > *Observations/Comments* > - The surprising part was the regionservers were trying to make so many > requests for the same block in the same minute (let alone hour). Could this > happen because the original request took a few seconds and so the > regionserver re-requested ? I didn't see any fetch errors in the > regionserver logs for blocks. > - Even more strange; my heap size was at 11G and the time when this was > happening, the used heap was at 2-4G. I would have expected the heap to > grow higher than that since the blockCache should be using atleast 40% of > the available heap space. > - Another strange thing that I observed was, the block was being requested > from the same datanode every single time. > > *Possible Solution/Changes* > - Would it make sense to give remote blocks higher priority over the local > blocks that can be read via SCR and not let them get evicted if there is a > tie in which block to evict ? > - Should we throttle the number of outgoing requests for a block ? I am not > sure if my firewall caused some issue but I wouldn't expect multiple block > fetch requests in the same minute. I did see a few RST packets getting > dropped at the firewall but I wasn't able to trace the problem was due to > this. > - We have 3 replicas available, shouldn't we request from the other > datanode if one might take a lot of time ? The amount of time it took to > read a block went up when the box was under heavy load, yet the re-requests > were going to the same one. Is this something that is available on the > DFSClient and can we exploit it ? > - Is it possible to migrate a region to a server which has higher number of > blocks available for it ? We don't need to make this automatic, but we > could provide a command that could be invoked manually to assign a region > to a specific regionserver. Thoughts ? > > Thanks, > Viral >
