Silly question... Why are you trying to disable automated compaction?
And then the equally silly question... are you attempting to run full compactions manually? On Jul 9, 2013, at 11:41 AM, David Koch <[email protected]> wrote: > Hello, > > Thank you for your replies. > > So, as suggested, I tweaked the following settings in Cloudera Manager: > hbase.hstore.compactionThreshold=10000 > hbase.hstore.compaction.max - I did not touch, I tried setting it to "0" > but the minimum is 2 > > I can't see any compactions being launched but the job still crashes, > here's a sample of the logs (*) > > Run out of memory; HRegionServer will abort itself immediately > java.lang.OutOfMemoryError: Direct buffer memory > > Failed open of > region=<table_name_removed>,\xF7\x98\x98~'\xD3E\x89\xA2\xDF\x10\xB4\x02K\xC2\x17,1373363234166.363473b8981db865db05c47ccdc45355., > starting to roll back the global memstore size. > org.apache.hadoop.hbase.DroppedSnapshotException: region: > > Anyway: http://cdn.memegenerator.net/instances/400x/28234921.jpg > > Should it be that hard to import a reasonable amount of data into a > default-configured cluster of 10 reasonably powerful machines? If you are > inclined to help, I'll gladly provide more in-depth information. > > Thank you, > > /David > > > (*) (I am browsing this from Cloudera Manager since I don't have shell > access to the nodes) > > > > > On Tue, Jul 9, 2013 at 4:58 PM, Ted Yu <[email protected]> wrote: > >> Do you specify startTime and endTime parameters for the CopyTable job ? >> >> Cheers >> >> On Tue, Jul 9, 2013 at 4:38 AM, David Koch <[email protected]> wrote: >> >>> Hello, >>> >>> We disabled automated major compactions by setting >>> hbase.hregion.majorcompaction=0. >>> This was to avoid issues during buik import of data since compactions >>> seemed to cause the running imports to crash. However, even after >>> disabling, region server logs still show compactions going on, as well as >>> aborted compactions. We also get compaction queue size warnings in >> Cloudera >>> Manager. >>> >>> Why is this the case? >>> >>> To be fair, we only disabled automated compactions AFTER the import >> failed >>> for the first time (yes, HBase was restarted) so maybe there are some >>> trailing compactions, but the queue size keeps increasing which I guess >>> should not be the case. Then again, I don't know how aborted compactions >>> are counted - i.e not sure whether or not to trust the metrics on this. >>> >>> A bit more about what I am trying to accomplish: >>> >>> I am bulk loading about 100 indexed .lzo files with 20 * 10^6 Key-Value >>> (0.5kb) each into an HBase table. Each file is loaded by a separate >> Mapper >>> job, several of these jobs run in parallel to make sure all task trackers >>> are used. Key distribution is the same in each file so even region growth >>> is to be expected. We did not pre-split the table as it does not seem to >>> have been a limiting factor earlier. >>> >>> On a related note. What if any experience do other HBase/Cloudera users >>> have with the Snapshotting feature detailed below? >>> >>> http://www.cloudera.com/content/cloudera-content/cloudera- >>> docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_12.html >>> >>> We need of a robust way to do inter-cluster cloning/back-up of tables, >>> preferably without taking the source table offline or impacting >> performance >>> of the source cluster. We only use HDFS files for importing because the >>> CopyTable job needs to run on the source cluster and cannot be resumed >> once >>> it fails. >>> >>> Thanks, >>> >>> /David >>> >>
