Hello, Thank you for your replies.
So, as suggested, I tweaked the following settings in Cloudera Manager: hbase.hstore.compactionThreshold=10000 hbase.hstore.compaction.max - I did not touch, I tried setting it to "0" but the minimum is 2 I can't see any compactions being launched but the job still crashes, here's a sample of the logs (*) Run out of memory; HRegionServer will abort itself immediately java.lang.OutOfMemoryError: Direct buffer memory Failed open of region=<table_name_removed>,\xF7\x98\x98~'\xD3E\x89\xA2\xDF\x10\xB4\x02K\xC2\x17,1373363234166.363473b8981db865db05c47ccdc45355., starting to roll back the global memstore size. org.apache.hadoop.hbase.DroppedSnapshotException: region: Anyway: http://cdn.memegenerator.net/instances/400x/28234921.jpg Should it be that hard to import a reasonable amount of data into a default-configured cluster of 10 reasonably powerful machines? If you are inclined to help, I'll gladly provide more in-depth information. Thank you, /David (*) (I am browsing this from Cloudera Manager since I don't have shell access to the nodes) On Tue, Jul 9, 2013 at 4:58 PM, Ted Yu <[email protected]> wrote: > Do you specify startTime and endTime parameters for the CopyTable job ? > > Cheers > > On Tue, Jul 9, 2013 at 4:38 AM, David Koch <[email protected]> wrote: > > > Hello, > > > > We disabled automated major compactions by setting > > hbase.hregion.majorcompaction=0. > > This was to avoid issues during buik import of data since compactions > > seemed to cause the running imports to crash. However, even after > > disabling, region server logs still show compactions going on, as well as > > aborted compactions. We also get compaction queue size warnings in > Cloudera > > Manager. > > > > Why is this the case? > > > > To be fair, we only disabled automated compactions AFTER the import > failed > > for the first time (yes, HBase was restarted) so maybe there are some > > trailing compactions, but the queue size keeps increasing which I guess > > should not be the case. Then again, I don't know how aborted compactions > > are counted - i.e not sure whether or not to trust the metrics on this. > > > > A bit more about what I am trying to accomplish: > > > > I am bulk loading about 100 indexed .lzo files with 20 * 10^6 Key-Value > > (0.5kb) each into an HBase table. Each file is loaded by a separate > Mapper > > job, several of these jobs run in parallel to make sure all task trackers > > are used. Key distribution is the same in each file so even region growth > > is to be expected. We did not pre-split the table as it does not seem to > > have been a limiting factor earlier. > > > > On a related note. What if any experience do other HBase/Cloudera users > > have with the Snapshotting feature detailed below? > > > > http://www.cloudera.com/content/cloudera-content/cloudera- > > docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_12.html > > > > We need of a robust way to do inter-cluster cloning/back-up of tables, > > preferably without taking the source table offline or impacting > performance > > of the source cluster. We only use HDFS files for importing because the > > CopyTable job needs to run on the source cluster and cannot be resumed > once > > it fails. > > > > Thanks, > > > > /David > > >
