I'm currently importing some files into HBase and am running into an problem 
with a large number of store files being created. We have some back data which 
is stored in very large sequence files (3-5 Gb in size). When we import this 
data the amount of stores created does not get out of hand. When we switch to 
smaller sequence files being imported we see that the number of stores rises 
quite dramatically. I do not know if this is happening because we are flushing 
the commits more frequently with smaller files. I'm wondering if anybody has 
any advice regarding this issue. My main concern is during this process we do 
not finish flushing to disk (And we set WritetoWal False). We always hit the 90 
second timeout due to heavy write load. As these store files pile up, and they 
do not get committed to disk, we run into issues where we could lose a lot of 
data if something were to crash.

I have created screen shots of or monitoring application for HBase which shows 
the spikes in activity.

http://twitpic.com/photos/jeremy_carroll


Reply via email to