> 2012/6/21, Frédéric Fondement <[email protected]>: > opt3. looks the nicest (only 3-4 tables to scan when reading), but won't my > daily major compact become crazy ?
If you want more control over the major compaction process, for example to lessen the load on your production cluster to a constant background level, the HBase shell is the JRuby irb so you have the full power of the HBase API and Ruby, in the worst case you can write a shell script that gets a list of regions and triggers major compaction on each region separately or according to whatever policy you construct. The script invocation can happen manually or out of crontab. Another performance consideration is how many expired cells might have to be skipped by a scan. If you have a wide area of the keyspace that is all expired at once, then the scan will seem to "pause" while traversing this area. However, you can use setTimeRange to bound your scan by time range and then HBase can optimize whole HFiles away just by examining their metadata. Therefore I would recommend using both TTLs for automatic background garbage collection of expired entries, as well as time range bounded scans for read time optimization. Incidentally, there was an interesting presentation at HBaseCon recently regarding a creative use of timestamps: http://www.slideshare.net/cloudera/1-serving-apparel-catalog-from-h-base-suraj-varma-gap-inc-finalupdated-last-minute (slide 16). Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
