Simple question, did you copy and paste that snippet since it has two name stanzas.
On Tue, Sep 23, 2014 at 9:42 AM, Jean-Marc Spaggiari < [email protected]> wrote: > Hi James, > > Is it possible that you are impacted by > https://issues.apache.org/jira/browse/HBASE-10118 ? Any change to test > with > one release where HBASE-10118 is available? > > JM > > 2014-09-23 12:10 GMT-04:00 James Estes <[email protected]>: > > > It does sound like what I'd want (that's why I was trying to use it :) ), > > but it isn't working as described. Maybe it is a bug? > > > > The behavior I'm seeing is that the delete markers are removed on major > > compaction, regardless of having a hbase.hstore.time.to.purge.deletes set > > in hbase-site.xml: > > https://gist.github.com/housejester/2b8fbba0d05c6abbe784 > > > > I think I've found the issue now. You mentioned the setting could be > > applied per CF...so I tested that way, and it works as expected. My > > hbase-site.xml had: > > > > <property> > > <name>hbase.hstore.time.to.purge.deletes</name> > > <name>600000</name> > > </property> > > > > But that doesn't seem to be applied (even with restarts, etc). Setting > > hbase.hstore.time.to.purge.deletes directly on the column family does > work > > though: > > https://gist.github.com/housejester/a81274bf74a8666fba32 > > > > Not sure why it isn't picking up from my hbase-site.xml, but I'll just > > configure it on the CFs. This is on hbase-0.98.6.1-hadoop2 and > > hbase-0.96.0-hadoop2 running in local mode. > > > > Thanks Lars, > > James > > > > On Mon, Sep 22, 2014 at 11:04 PM, lars hofhansl <[email protected]> > wrote: > > > > > You can use the hbase.hstore.time.to.purge.deletes config option. > > > You can set it globally or per Column Family. > > > > > > This is the description in hbase-default.xml: > > > <property> > > > <name>hbase.hstore.time.to.purge.deletes</name> > > > <value>0</value> > > > <description>The amount of time to delay purging of delete markers > > > with future timestamps. If > > > unset, or set to 0, all delete markers, including those with > future > > > timestamps, are purged > > > during the next major compaction. Otherwise, a delete marker is > > kept > > > until the major compaction > > > which occurs after the marker's timestamp plus the value of this > > > setting, in milliseconds. > > > </description> > > > </property> > > > > > > That seems to be exactly what you want. > > > > > > -- Lars > > > > > > > > > ----- Original Message ----- > > > From: James Estes <[email protected]> > > > To: [email protected] > > > Cc: > > > Sent: Monday, September 22, 2014 10:39 AM > > > Subject: Configuring tombstone purge independent of deleted cell purge > > > > > > Could tombstone purges be independent of purging deleted cells and > > > KEEP_DELETED_CELLS setting? In my use case, I do not want to keep > deleted > > > cells, but I do need to keep the tombstones around. Without the > > tombstones, > > > I'm not able to do incremental backups (custom, we do timerange raw > scans > > > ourselves for this). > > > > > > As a rough example, if I have the following timeline for the same row > key > > > (where t# is time): > > > t0 - full backup (using a time range up to t0) > > > t1 - PUT v1 > > > t2 - incremental backup #1 (time range t0 up to t2) > > > t3 - DELETE > > > t4 - flush and major compaction happens > > > t5 - incremental backup #2 (time range t2 up to t5) > > > t6 - full system crash > > > t7 - data restored from full backup + incrementals #1 and #2 > > > > > > When the restore completes, the row will have been un-deleted. This is > > > because the incremental backup in #2 will not have the tombstone, since > > it > > > gets compacted out. > > > > > > So in our case, I do NOT want to keep deleted cells (because I do not > > want > > > the cells to show up in time range scans users may do), but I DO want > to > > > keep the tombstones for a configurable amount of time (much larger than > > our > > > planned incremental backup schedule) so they are captured during > backup. > > > This would allow for the custom incremental backups to be independent > of > > > major compactions. Without it, the backup schedule would have to > manually > > > handle compactions and would always have to do a FULL Backup after a > > major > > > compaction (otherwise there can be loss because when any major > compaction > > > happens, any tombstone that came in after the last incremental will be > > > lost). > > > > > > It seems like there could be another setting for when to purge > > tombstones. > > > Currently, there is hbase.hstore.time.to.purge.deletes for when to > purge > > > deleted cells, but ONLY if KEEP_DELETED_CELLS is configured (which > makes > > > sense). I'd like to propose a hbase.hstore.time.to.purge.tombstones > that > > > could default to the same value as hbase.hstore.time.to.purge.deletes, > > but > > > would take effect regardless of the KEEP_DELETED_CELLS setting. It > should > > > have a constraint so that hbase.hstore.time.to.purge.deletes < > > > hbase.hstore.time.to.purge.tombstones (b/c we don't want tombstones > > > disappearing before the deleted cells). > > > > > > Does this seem reasonable? Is there another approach I might take? > > > > > > Thanks, > > > > > > > > > -- Dan Di Spaltro
