Hi David, Minor compactions can be promoted to Major compactions when all the files are selected for compaction. And the property below will not avoid that to occur.
Section 9.7.6.5 there: http://hbase.apache.org/book/regions.arch.html JM 2013/7/9 David Koch <[email protected]>: > Hello, > > We disabled automated major compactions by setting > hbase.hregion.majorcompaction=0. > This was to avoid issues during buik import of data since compactions > seemed to cause the running imports to crash. However, even after > disabling, region server logs still show compactions going on, as well as > aborted compactions. We also get compaction queue size warnings in Cloudera > Manager. > > Why is this the case? > > To be fair, we only disabled automated compactions AFTER the import failed > for the first time (yes, HBase was restarted) so maybe there are some > trailing compactions, but the queue size keeps increasing which I guess > should not be the case. Then again, I don't know how aborted compactions > are counted - i.e not sure whether or not to trust the metrics on this. > > A bit more about what I am trying to accomplish: > > I am bulk loading about 100 indexed .lzo files with 20 * 10^6 Key-Value > (0.5kb) each into an HBase table. Each file is loaded by a separate Mapper > job, several of these jobs run in parallel to make sure all task trackers > are used. Key distribution is the same in each file so even region growth > is to be expected. We did not pre-split the table as it does not seem to > have been a limiting factor earlier. > > On a related note. What if any experience do other HBase/Cloudera users > have with the Snapshotting feature detailed below? > > http://www.cloudera.com/content/cloudera-content/cloudera- > docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_12.html > > We need of a robust way to do inter-cluster cloning/back-up of tables, > preferably without taking the source table offline or impacting performance > of the source cluster. We only use HDFS files for importing because the > CopyTable job needs to run on the source cluster and cannot be resumed once > it fails. > > Thanks, > > /David
