You are likely just hitting the threshold for a minor compaction and by picking up all the files (I'm making a guess that it does) it gets upgraded to a major compaction. The threshold is 3 by default.
So after loading 3 files you should get a compaction per region, then every other 2 loading you will trigger another per region. It seems to me that it would be better if you were able to do a single load for all your files. J-D On Thu, Mar 21, 2013 at 6:29 AM, Nicolas Seyvet <nicolas.sey...@gmail.com> wrote: > Hi, > > We are using code similar to > https://github.com/jrkinley/hbase-bulk-import-example/ in order to > benchmark our HBase cluster. We are running a CDH4 installation, and HBase > is version 0.92.1-cdh4.1.1.. The cluster is composed of 12 slaves and 1 > master and 1 secondary master. > > During the bulk load insert, roughly within 3 hours after the start > (~200Gb), we notice a large drop in performance in the insert rate. At the > same time, there is a spike in IO and CPU usage. Connecting to a Region > Server (RS), the Monitored Task section shows that a compaction is started. > > I have set hbase.hregion.max.filesize to 107374182400 (100Gb), and disable > automatic major compaction hbase.hregion.majorcompactionis set to 0. > > What we are doing is that we have 1000 files of synthetic data (csv), where > each row in a file is one row to insert into HBase, each file contains 600K > rows (or 600K events). Our loader works in the following way: > 1. Look for a file > 2. When a file is found, prepare a job for that file > 3. Launch job > 4. Wait for completion > 5. Compute insert rate (nb of rows /time) > 6. Repeat from 1 until there are no more files. > > What I understand of the bulk load M/R job is that it produces one HFile > for each Region. > > Questions: > - How is HStoreFileSize calclulated? > - What do HStoreFileSize, storeFileSize and hbase.hregion.max.filesize have > in common? > - Can the number of HFiles trigger a major compaction? > > Thx for help. I hope my questions make sense. > > /Nicolas