[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2015-02-18 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326949#comment-14326949
 ] 

Gary Helmling commented on HBASE-11292:
---

[~lhofhansl] some ability to apply a filter to the delete marker would be ideal 
(at least in this case having a filter indicate a SKIP), but how that changes 
ScanQueryMatcher is of course tricky.  Need to look at this a bit and think 
about it.

[~jamestaylor] yes we ignore puts from in progress or deleted transactions in 
the filter.  This happens with normal (not raw) scans currently.

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2015-02-16 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323681#comment-14323681
 ] 

James Taylor commented on HBASE-11292:
--

[~lhofhansl] - that's a good point. What about even ignoring a Put? Does the 
Tephra TransactionVisibilityFilter[1] handle ignoring a Put such that an 
earlier Put would be seen? Do all scans have to be raw scans, [~ghelmling]?

[1] 
https://github.com/caskdata/tephra/blob/develop/tephra-hbase-compat-0.98/src/main/java/co/cask/tephra/hbase98/coprocessor/TransactionVisibilityFilter.java

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2015-02-05 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308197#comment-14308197
 ] 

Lars Hofhansl commented on HBASE-11292:
---

I will think about this in earnest during the next few days.

Related: A scan coprocessor can currently not ignore delete markers (without 
reimplementing the entire delete marker logic). I am not about a good interface 
for this (seems like we need a pass down a list of excluded timeranges). 
[~ghelmling], any ideas?

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2015-01-16 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281237#comment-14281237
 ] 

James Taylor commented on HBASE-11292:
--

If it simplifies things, for Tephra transactions in Phoenix, we only need the 
ability to 1) undo a column delete markers and 2) undo a column family delete 
marker. We don't need to be able to undo an undo or to redo.

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2015-01-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281229#comment-14281229
 ] 

Lars Hofhansl commented on HBASE-11292:
---

The other tricky part is to to avoid removing identical cells (in terms of row, 
col, ts) if their type is UNDO.
Also do we need REDO? What if we "undid" a delete, and then want to delete at 
the exact coordinates (row, col, ts) again? That delete would continue to be 
shadowed until the UNDO marker is gone, or we undo the UNDO.

Would need to carefully think through the compaction implications. F.e. when 
can I safely compact the UNDO markers? Need to be able to guarantee that every 
affected Cell is also deleted. That is not true for all compactions (not even 
major ones).


> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2015-01-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281227#comment-14281227
 ] 

Lars Hofhansl commented on HBASE-11292:
---

Let's revive this. What I have been proposing in a meeting we had was an "UNDO" 
cell type. Undo would sort before deletes and simply undo the next cell that 
immediately follows in sort order. Another UNDO with the same TS would undo the 
UNDO, as so on.

I do not think we want to undo Puts, we have deletes for that.
So we would need to two new Cell type: undo delete (for version and column 
delete markers) and undo delete family for family delete markers.

I think my comments on family delete bloom filters above are misguided. At 
worst we'd see a performance degradation for seeking to the beginning of the 
row when that is not necessary when the delete is undone.

The only trick part here is that all seeking before a row or column or to the 
end of a row or column need to be seeking correctly with UNDO cell types.


> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2014-08-28 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114463#comment-14114463
 ] 

Gary Helmling commented on HBASE-11292:
---

bq. Yet another way to look at it is: why undelete at all? Ignoring failed 
transactions needs to be implemented anyway, so instead of undo one could just 
continue to ignore the transaction.

Yes, for the transaction case specifically, that is a fair approach to take.  
We can (and do) ignore writes from the failed transaction, but we also need a 
way to clear transaction IDs from the invalid set so that it doesn't continue 
to grow.  This is difficult, but it does need to be solved, regardless of 
whether or not we have undeletes.  If we can do this efficiently enough, then 
there's really no further need to attempt to rollback failed transactions at 
all.

However, from the HBase perspective, it has been a long standing issue (or 
constraint at least) that deletes cannot be undone.  Just ordering operations 
by seqid so that you can redo any deleted puts is not a symmetric solution, for 
the reasons described above (in the case of a single row delete marker, you 
have to rewrite all of the columns in the row).

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2014-07-26 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075496#comment-14075496
 ] 

James Taylor commented on HBASE-11292:
--

bq. I think there is no difference in terms of the read because you need to 
read the previous "delete" put as well to construct the "undelete" put while 
the other way need to read&write more data.
[~jeffreyz], the Undelete is issued by the client, the same client that issued 
the Delete. It's done to undo the effects of a failed transaction (i.e. like a 
compensating transaction). You don't need to lookup the prior state of the row. 
You just need to issue an Undelete with the same rowkey and timestamp as the 
Delete.

bq. Yet another way to look at it is: why undelete at all? Ignoring failed 
transactions needs to be implemented anyway, so instead of undo one could just 
continue to ignore the transaction.
[~lhofhansl], I agree, this is a fallback. It puts more pressure on being able 
to "prune" this invalid list. If that can be figured out and done efficiently, 
than the need for an Undelete decreases. Maybe that should be discussed in a 
separate JIRA?

Assuming that an Undelete may only be done for a Delete (i.e. an Undelete can 
not be undeleted), is it feasible then? Besides requiring another set of Bloom 
filters, are their more gotchas? Is it too heavy a burden to have another set 
of Bloom filters? Any other options?

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2014-07-26 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075492#comment-14075492
 ] 

Lars Hofhansl commented on HBASE-11292:
---

Can we turn this around and implement undelete with redoing the corresponding 
put and handling the ambiguities with seq ids?
I've been arguing against this, since it would break idempotency of HBase 
actions, but on light of this I might relent :)
I guess it would require to know what exactly the delete affected, which may or 
may not possible.

Yet another way to look at it is: why undelete at all? Ignoring failed 
transactions needs to be implemented anyway, so instead of undo one could just 
continue to ignore the transaction.

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2014-06-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020617#comment-14020617
 ] 

Jeffrey Zhong commented on HBASE-11292:
---

{quote}
 since the client would have to perform a read to find the previous value prior 
to the delete, then issue a new put, instead of simply issuing an undelete with 
the same parameters as the prior delete.
{quote}
I think there is no difference in terms of the read because you need to read 
the previous "delete" put as well to construct the "undelete" put while the 
other way need to read&write more data.

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2014-06-05 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018962#comment-14018962
 ] 

James Taylor commented on HBASE-11292:
--

bq. the Delete and Undelete would occur at the same timestamp (t5), since the 
Undelete would be rolling back the Delete that was part of the transaction

I think this is fine. I can't think of a use case where you'd need to see the 
Delete at a given timestamp.

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2014-06-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017894#comment-14017894
 ] 

stack commented on HBASE-11292:
---

bq. There is always going to be something that sorts first, which effectively 
becomes undoable.

Our sorting on type is a mistake (as [~sershe] has noted a few times).  Would 
fixing this help here?

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2014-06-04 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017885#comment-14017885
 ] 

Gary Helmling commented on HBASE-11292:
---

bq. we have delete bloom filters to avoid seeks to the beginning on the row to 
check for family delete markers. Those would no longer work, or in other words 
we'd need boom filters for the undelete markers.

Hmm, I wasn't familiar with these, but, yes, sounds like we would unfortunately 
need another set of bloom filters for the undeletes.  There would also need to 
be some change to the use of the delete bloom filters if operations were 
ordered by seqid, wouldn't there?

bq. will we every need so to undo an undelete? 

Yes, I thought of this as well.  What undoes the undeletes?  In a sense this 
shifts the irreversible operation to the undelete.  I think this is an inherent 
problem with the approach of sorting operations within the same timestamp by 
operation type.  There is always going to be something that sorts first, which 
effectively becomes undoable.

bq. how does this fit into the discussion about using sequence numbers to order 
operations?

I think the discussion to order operations by seqid is simply an alternate 
approach to the some of the same underlying problems.  For the issue mentioned 
above, sorting by seqid is conceptually simpler -- there is no need for a 
separate undelete operation and undoing a delete is possible by re-issuing a 
new put for the previous value at the same timestamp as the previous delete.  
Of course that means that the final outcome depends on the server observed 
ordering of the operations.  But at the same time, there is no "irreversible" 
operation.

For the use case I'm considering (a single client rolling back it's previously 
persisted changes at a given timestamp), ordering by seqid would also work.  
Since it's a single client rolling back it's own operations, the operations can 
be ordered by the client, so there's no lack of determinism in server side 
ordering.  Rolling back a delete with seqid ordering would be slightly more 
complicated, though, since the client would have to perform a read to find the 
previous value prior to the delete, then issue a new put, instead of simply 
issuing an undelete with the same parameters as the prior delete.

Or you could combine both approaches and order operations by seqid, but add an 
undelete operation as well.  The undelete would then sort prior to the delete 
by virtue of being issued after it.  It would still be a no-read operation.  
And the undelete could be undone by issuing a new delete.  Maybe this 
combination ultimately winds up being best.

Thanks for the comments.  I'm open to however we can most efficiently solve 
this with the least amount of added complexity.

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2014-06-04 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017841#comment-14017841
 ] 

Gary Helmling commented on HBASE-11292:
---

bq. Would every type of Delete need a corresponding Undelete?

Yes, I think it would be necessary to have complete parity between the two, 
since the goal is to cancel any type of delete operation.

{quote}
When KEEP_DELETED_CELLS is on, assume:
* Put is done at t1
* Delete is done at t5
* Undelete is done at t7
If a client does a read at t6, the Delete would be in effect, while if a client 
does a read at t8, the Delete would not be in effect. Is that how it'd work?
{quote}

Yes, I think this would work exactly as you describe.  In the case of 
supporting timestamp-overriding transactions, though, the Delete and Undelete 
would occur at the same timestamp (t5), since the Undelete would be rolling 
back the Delete that was part of the transaction.

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2014-06-03 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017207#comment-14017207
 ] 

Lars Hofhansl commented on HBASE-11292:
---

I fear that the implications are tricky.

# we have delete bloom filters to avoid seeks to the beginning on the row to 
check for family delete markers. Those would no longer work, or in other words 
we'd need boom filters for the undelete markers.
# will we every need so to undo an undelete? :)
# how does this fit into the discussion about using sequence numbers to order 
operations?

Still +1 on doing this if ts semantics are like the ones I introduced for 
KEEP_DELETED_CELLS (as James mentions)

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11292) Add an "undelete" operation

2014-06-03 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016905#comment-14016905
 ] 

James Taylor commented on HBASE-11292:
--

+1 for this new operation as it will help in the implementation of 
transactions. Would every *type* of Delete need a corresponding Undelete?

When KEEP_DELETED_CELLS is on, assume:
- Put is done at t1
- Delete is done at t5
- Undelete is done at t7

If a client does a read at t6, the Delete would be in effect, while if a client 
does a read at t8, the Delete would not be in effect. Is that how it'd work?

> Add an "undelete" operation
> ---
>
> Key: HBASE-11292
> URL: https://issues.apache.org/jira/browse/HBASE-11292
> Project: HBase
>  Issue Type: New Feature
>  Components: Deletes
>Reporter: Gary Helmling
>  Labels: Phoenix
>
> While column families can be configured to keep deleted cells (allowing time 
> range queries to still retrieve those cells), deletes are still somewhat 
> unique in that they are irreversible operations.  Once a delete has been 
> issued on a cell, the only way to "undelete" it is to rewrite the data with a 
> timestamp newer than the delete.
> The idea here is to add an "undelete" operation, that would make it possible 
> to cancel a previous delete.  An undelete operation will be similar to a 
> delete, in that it will be written as a marker ("tombstone" doesn't seem like 
> the right word).  The undelete marker, however, will sort prior to a delete 
> marker, canceling the effect of any following delete.
> In the absence of a column family configured to KEEP_DELETED_CELLS, we can't 
> be sure if a prior delete marker and the effected cells have already been 
> garbage collected.  In this case (column family not configured with 
> KEEP_DELETED_CELLS) it may be necessary for the server to reject undelete 
> operations to avoid creating the appearance of a client contact for undeletes 
> that can't reliably be honored.
> I think there are additional subtleties of the implementation to be worked 
> out, but I'm also interested in a broader discussion of interest in this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)