Geoffrey Jacoby created HBASE-23766:
---------------------------------------

             Summary: Support Point-In-Time Queries
                 Key: HBASE-23766
                 URL: https://issues.apache.org/jira/browse/HBASE-23766
             Project: HBase
          Issue Type: New Feature
            Reporter: Geoffrey Jacoby
            Assignee: Geoffrey Jacoby


HBase currently offers a snapshot feature which allows operators to capture the 
state of a table at a point in time in a way that can be cloned or queried in 
the future. It's quite useful in some circumstances, but limited because it's a 
heavyweight operation, and because it requires prior knowledge of the time you 
want to capture. 

Phoenix currently offers a feature called "SCN", which uses the max timestamp 
on Scans to provide the illusion of a "lookback" query at a point in time. It's 
imperfect, however, because of HBase's filtering and cleanup logic for deletes, 
max versions and TTLs can prevent users from seeing certain Cells they would 
have been able to see at a previous point in time. Even PHOENIX-5645, and the 
equivalent HBASE-23602, which try to control major compaction cleanup, don't 
cover all edge cases completely. (For example, you can't see rows whose TTL has 
expired now but hadn't back then. Same with max versions.) 

There are useful non-Phoenix applications as well, such as a change stream that 
shows before/after images, as DynamoDB offers. 

Since full support will require new configuration options added not just to 
major compaction, but also to the read pipeline, I'm filing this as an umbrella 
JIRA so we can have smaller sub-tasks, rather than trying to cram everything 
into HBASE-23602. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to