> > A possible answer to keep all versions with no TTL, and do replication. At 
> > a certain size this ceases to be practical though.

> >
> 
> Discussing point-in-time-recovery here at our shop, and trying to
> avoid having to keep all versions is what prompted the below issue:
> 
> HBASE-4071  Data GC: Remove all versions > TTL EXCEPT the last
>                written version (Lars Hofhansl)
> 


That's why I fixed it :) I had been thinking about that more of as a first line 
of defense. :)

You are probably right and that is way to think about it generally.


> You want to support being able to restore any version?

That's probably what our ops folks want. Since this is fairly new territory for 
us, that might change, though.
And a backup of the past month on disk in a different datacenter might be good 
enough.


They might just be extremely delighted if we can give them any data in the past 
- say - month *instantly*.


> Would be nice if you could filter out complete WALs by looking at
> "metadata", metadata that does not currently exist: e.g. metadata
> could include what regions a WAL has edits for, the range of
> timestamps.


Yep. That would be nice. I also wonder if log entries should be tagged with the 
current timestamp (i.e. not the one set in Put or Delete, but the actual 
current time at the time the log was written) in addition to the sequence id. 
That would provide a global ordering (within the resolution of the timers at 
least).


> Or, as in hbase-50, could roll logs first before staring the copy.
> That'd narrow the number of WALs to replay for sure.


I see. Yep. HBASE-50 never really got finished, it seems.


> Would need a WAL to hfile mapreduce job.

That's if the table did not exist before, right? I.e. you mean there was no 
base backup and everything is restored from the WAL?

> I think the PITR would be easier if table-scoped.

Yes, it is actually something we would prefer. We had issues before, where a 
customer deleted some data (after confirming multiple time that this is really 
what they want to do :) ), then asked us to restore the data. We had restore 
the entire database to get those few rows back.


It would certainly simplify the problem. We'd just need a process in place that 
adds new tables to the replica first and removes them there after our backup 
time range (or maybe never delete them and just keep them mostly empty).


> Doing it cluster-wide would require our having the meta table in sync
> as you say elsewhere.  Or, we just dump the state of meta when doing a
> cluster backup at the end of PITR and restoring a cluster, the first
> thing we'd do is replace .META. (Could be issue if tables deleted
> between start of PITR and end).


If tables where deleted the WAL replay could just ignore them.


I think you have me convinced now, though, that just using HBASE-4071 is a 
better route to pursue, and only if we need more than that investigate more low 
level backup options.
So we need replication to be rock-solid...


> HBASE-4401 Record log region splits and region moves in the HLog


That's cool, watching that one now :)


-- Lars

Reply via email to