I'm currently importing some files into HBase and am running into an problem with a large number of store files being created. We have some back data which is stored in very large sequence files (3-5 Gb in size). When we import this data the amount of stores created does not get out of hand. When we switch to smaller sequence files being imported we see that the number of stores rises quite dramatically. I do not know if this is happening because we are flushing the commits more frequently with smaller files. I'm wondering if anybody has any advice regarding this issue. My main concern is during this process we do not finish flushing to disk (And we set WritetoWal False). We always hit the 90 second timeout due to heavy write load. As these store files pile up, and they do not get committed to disk, we run into issues where we could lose a lot of data if something were to crash.
I have created screen shots of or monitoring application for HBase which shows the spikes in activity. http://twitpic.com/photos/jeremy_carroll
