[jira] [Comment Edited] (HBASE-15968) MVCC-sensitive semantics of versions

2016-10-18 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585026#comment-15585026
 ] 

Phil Yang edited comment on HBASE-15968 at 10/18/16 10:36 AM:
--

I did a basic test with modified PerformanceEvaluation. In the test it always 
uses Get to read the same row with configurable setMaxVersions. I use only one 
thread because more client threads will increase the total latency and I think 
the proportion of time which SQM consumes will be smaller. If the new matcher 
is slower, we will get the biggest gap when we only have one thread(correct me 
if I am wrong).

Client and server are in two machine, ping latency is about 0.055ms.

First if the table is empty, we will get nothing in the test. The average 
latencies of old semantics and new semantics are 0.185ms and 0.189ms. New 
semantics is 2% slower.

If there is one Put in the row, latencies are 0.210 and 0.220, 5% slower.

If there is three Puts in the row, and Get.setMaxVersions(1), results are 0.199 
and 0.220, 10% slower. Old semantics has better result than the previous test, 
need to dig more. New semantics has the same result which is expected.

If there is three Puts in the row, and Get.setMaxVersions(3), results are 0.203 
and 0.220, 8% slower.

If there is 50 Puts and setMaxVersions(50), results are 0.279 and 0.325, 16% 
slower.

Next I'll check why new semantics is a little slower than old semantics (we 
expect they are same), and why the old semantics will be faster when there are 
more Puts(maybe the SQM can be optimized in the code of exit). Use JMH to get 
more direct result for SQM.

Then I'll test results when we have delete marker.


was (Author: yangzhe1991):
I did a basic test with modified PerformanceEvaluation. In the test it always 
uses Get to read the same row with configurable setMaxVersions. I use only one 
thread because more client threads will increase the total latency and I think 
the proportion of time which SQM consumes will be smaller. If the new matcher 
is slower, we will get the biggest gap when we only have one thread(correct me 
if I am wrong).

Client and server are in two machine, ping latency is about 0.055ms.

First if the table is empty, we will get nothing in the test. The average 
latencies of old semantics and new semantics are 0.185ms and 0.189ms. New 
semantics is 2% slower.

If there is one Put in the row, latencies are 0.210 and 0.220, 5% slower.

If there is three Puts in the row, and Get.setMaxVersions(1), results are 0.199 
and 0.220, 10% slower. Old semantics has better result than the previous test, 
need to dig more. New semantics has the same result which is expected.

If there is three Puts in the row, and Get.setMaxVersions(3), results are 0.203 
and 0.220, 8% slower.

If there is 50 Puts and setMaxVersions(50), results are 0.279 and 0.325, 16% 
slower.

Next I'll check why new semantics is a little slower than old semantics (we 
expect they are same), and why the old semantics will be faster when there are 
more Puts(maybe the SQM can be optimized in the code of exit).

Then I'll test results when we have delete marker.

> MVCC-sensitive semantics of versions
> 
>
> Key: HBASE-15968
> URL: https://issues.apache.org/jira/browse/HBASE-15968
> Project: HBase
>  Issue Type: New Feature
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-15968-v1.patch, HBASE-15968-v2.patch, 
> HBASE-15968-v3.patch, HBASE-15968-v4.patch
>
>
> In HBase book, we have a section in Versions called "Current Limitations" see 
> http://hbase.apache.org/book.html#_current_limitations
> {quote}
> 28.3. Current Limitations
> 28.3.1. Deletes mask Puts
> Deletes mask puts, even puts that happened after the delete was entered. See 
> HBASE-2256. Remember that a delete writes a tombstone, which only disappears 
> after then next major compaction has run. Suppose you do a delete of 
> everything ⇐ T. After this you do a new put with a timestamp ⇐ T. This put, 
> even if it happened after the delete, will be masked by the delete tombstone. 
> Performing the put will not fail, but when you do a get you will notice the 
> put did have no effect. It will start working again after the major 
> compaction has run. These issues should not be a problem if you use 
> always-increasing versions for new puts to a row. But they can occur even if 
> you do not care about time: just do delete and put immediately after each 
> other, and there is some chance they happen within the same millisecond.
> 28.3.2. Major compactions change query results
> …​create three cell versions at t1, t2 and t3, with a maximum-versions 
> setting of 2. So when getting all versions, only the values at t2 and t3 will 
> be returned. But if you delete the version at t2 or t3, the one at t1 will 
> appear 

[jira] [Comment Edited] (HBASE-15968) MVCC-sensitive semantics of versions

2016-09-13 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487753#comment-15487753
 ] 

Phil Yang edited comment on HBASE-15968 at 9/13/16 5:04 PM:


[~stack] Thanks for your reply. This patch has some bugs, I fixed them locally 
and not already upload a new patch because it does not support visibility 
labels. I can upload a new patch with the fixes (although visibility labels is 
not done) if needed and upload it to review board to help you review.

{quote}
This 'fixed' behavior should be default in 2.0.0
{quote}
A concern is performance. In current "broken" behavior, we read cell by the 
order of timestamp desc and each cell we have an O(1) time complexity.  But in 
the new behavior, we have to save some info from all cells whose ts is higher 
than this cell(and all cells for family-delete marker) and check if we can see 
this cell according to their ts and mvcc, which is not O(1). I am not very 
sure, complexity may be O(N*logM) where N is number of delete markers whose ts 
is higher but mvcc is lower, and M is the number of Puts with higher timestamp 
and can not be seen for users. I implement the data structure by current design 
because I think N will not be very high even if we have many Puts and Deletes 
because in the most case we will not have a Cell with higher mvcc but lower 
timestamp, and M equals to maxversion if there is no Delete.


{quote}
Yeah, its an outstanding question as to when it is safe to set sequenceid/mvcc 
== 0.
This is for new tables only?
{quote}
In the patch I disable this feature, we always save mvcc. So if we alter a 
table into new behavior, we should handle Cells whose mvcc is in HFile's 
header. Many Cells will have same mvcc, which is not a very difficult issue but 
we need prove there is no bug for this situation. And we have to define the 
order with same mvcc, just like we define the order of Type.

{quote}
mvcc-sensitive is not a good name because the whole system is already mvcc 
sensitive.
{quote}
To be honest, I spend some time on naming this issue but I have no idea what is 
the best  Just call it "fix the bug" is very exciting for me :)


was (Author: yangzhe1991):
[~stack] Thanks for your reply. This patch has some bugs, I fixed them locally 
and not already upload a new patch because it does not support visibility 
labels. I can upload a new patch with the fixes (although visibility labels is 
not done) if needed and upload it to review board to help you review.

{quote}
This 'fixed' behavior should be default in 2.0.0
{quote}
A concern is performance. In current "broken" behavior, we read cell by the 
order of timestamp desc and each cell we have an O(1) time complexity.  But in 
the new behavior, we have to save some info from all cells whose ts is higher 
than this cell(and all cells for family-delete marker) and check if we can see 
this cell according to their ts and mvcc, which is not O(1). I am not very 
sure, complexity may be O(N*logM) where N is number of delete markers whose ts 
is higher but mvcc is lower, and M is the maxversion of the cf's conf. I 
implement the data structure by current design because I think N will not be 
very high even if we have many Puts and Deletes because in the most case we 
will not have a Cell with higher mvcc but lower timestamp, and M is usually 
only 1,2, 3 or some small number.


{quote}
Yeah, its an outstanding question as to when it is safe to set sequenceid/mvcc 
== 0.
This is for new tables only?
{quote}
In the patch I disable this feature, we always save mvcc. So if we alter a 
table into new behavior, we should handle Cells whose mvcc is in HFile's 
header. Many Cells will have same mvcc, which is not a very difficult issue but 
we need prove there is no bug for this situation. And we have to define the 
order with same mvcc, just like we define the order of Type.

{quote}
mvcc-sensitive is not a good name because the whole system is already mvcc 
sensitive.
{quote}
To be honest, I spend some time on naming this issue but I have no idea what is 
the best  Just call it "fix the bug" is very exciting for me :)

> MVCC-sensitive semantics of versions
> 
>
> Key: HBASE-15968
> URL: https://issues.apache.org/jira/browse/HBASE-15968
> Project: HBase
>  Issue Type: New Feature
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-15968-v1.patch
>
>
> In HBase book, we have a section in Versions called "Current Limitations" see 
> http://hbase.apache.org/book.html#_current_limitations
> {quote}
> 28.3. Current Limitations
> 28.3.1. Deletes mask Puts
> Deletes mask puts, even puts that happened after the delete was entered. See 
> HBASE-2256. Remember that a delete writes a tombstone, which only disappears 
> after then next major compaction has run. 

[jira] [Comment Edited] (HBASE-15968) MVCC-sensitive semantics of versions

2016-09-13 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486867#comment-15486867
 ] 

Phil Yang edited comment on HBASE-15968 at 9/13/16 10:41 AM:
-

After reading code introduced by HBASE-7663 and HBASE-10885, if I am not wrong, 
visibility labels will not change the logic of versions. If the maxVersion is 
3, we can only read first three versions and check if it can be seen according 
to tags. For delete, we can set tags for it and it only mask Put with same 
tags(no tag is also a kind of tag). For example, if we have Put(b), Put(b), 
Put(a), Delete(a), Put(b), Put(a), for reader(a) we can read the lated Put(a) 
and for reader(b) we  can read two Puts and the oldest one is masked, right?

For mvcc-sensitive semantics, we have VERSION_MASKED which means this Put is 
masked by enough number of versions and can never be read. So for 
mvcc-sensitive with visibility labels. If the order is  Put(b), Put(b), Put(a), 
Delete(a), Put(b), Put(a). What should be the result of reader(b)? The latest 
two Put(b) because the third put is deleted so the second will not be masked, 
or only the latest Put(b) because we can not see the Delete(a)? 

[~anoop.hbase][~ram_krish][~Apache9] What do you think? Thanks.


was (Author: yangzhe1991):
After reading code introduced by HBASE-7663 and HBASE-10885, if I am not wrong, 
visibility labels will not change the logic of versions. If the maxVersion is 
3, we can only read first three versions and check if it can be seen according 
to tags. For delete, we can set tags for it and it only mask Put with same 
tags(no tag is also a kind of tag).

For mvcc-sensitive semantics, we have VERSION_MASKED which means this Put is 
masked by enough number of versions and can never be read. So for 
mvcc-sensitive with visibility labels. If we have 4 Puts whose tags are a, a, 
b, a. The first one will never be read no mater what labels the reader has. But 
if we delete the third Put after we put it. The order is Put(a), Put(a), 
Put(b), Delete(b), Put(a). What should be the result of reader if its label is 
a? The latest two Put(a) because we can not see Delete(b), or all three of them 
because the Put(b) has been deleted although we can not see them? 

[~anoop.hbase][~ram_krish][~Apache9] What do you think? Thanks.

> MVCC-sensitive semantics of versions
> 
>
> Key: HBASE-15968
> URL: https://issues.apache.org/jira/browse/HBASE-15968
> Project: HBase
>  Issue Type: New Feature
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-15968-v1.patch
>
>
> In HBase book, we have a section in Versions called "Current Limitations" see 
> http://hbase.apache.org/book.html#_current_limitations
> {quote}
> 28.3. Current Limitations
> 28.3.1. Deletes mask Puts
> Deletes mask puts, even puts that happened after the delete was entered. See 
> HBASE-2256. Remember that a delete writes a tombstone, which only disappears 
> after then next major compaction has run. Suppose you do a delete of 
> everything ⇐ T. After this you do a new put with a timestamp ⇐ T. This put, 
> even if it happened after the delete, will be masked by the delete tombstone. 
> Performing the put will not fail, but when you do a get you will notice the 
> put did have no effect. It will start working again after the major 
> compaction has run. These issues should not be a problem if you use 
> always-increasing versions for new puts to a row. But they can occur even if 
> you do not care about time: just do delete and put immediately after each 
> other, and there is some chance they happen within the same millisecond.
> 28.3.2. Major compactions change query results
> …​create three cell versions at t1, t2 and t3, with a maximum-versions 
> setting of 2. So when getting all versions, only the values at t2 and t3 will 
> be returned. But if you delete the version at t2 or t3, the one at t1 will 
> appear again. Obviously, once a major compaction has run, such behavior will 
> not be the case anymore…​ (See Garbage Collection in Bending time in HBase.)
> {quote}
> These limitations result from the current implementation on multi-versions: 
> we only consider timestamp, no matter when it comes; we will not remove old 
> version immediately if there are enough number of new versions. 
> So we can get a stronger semantics of versions by two guarantees:
> 1, Delete will not mask Put that comes after it.
> 2, If a version is masked by enough number of higher versions (VERSIONS in 
> cf's conf), it will never be seen any more.
> Some examples for understanding:
> (delete t<=3 means use Delete.addColumns to delete all versions whose ts is 
> not greater than 3, and delete t3 means use Delete.addColumn to delete the 
> version whose ts=3)
> case 1: put t2 -> put t3 -> delete t<=3 -> put 

[jira] [Comment Edited] (HBASE-15968) MVCC-sensitive semantics of versions

2016-09-13 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486867#comment-15486867
 ] 

Phil Yang edited comment on HBASE-15968 at 9/13/16 10:41 AM:
-

After reading code introduced by HBASE-7663 and HBASE-10885, if I am not wrong, 
visibility labels will not change the logic of versions. If the maxVersion is 
3, we can only read first three versions and check if it can be seen according 
to tags. For delete, we can set tags for it and it only mask Put with same 
tags(no tag is also a kind of tag). For example, if we have Put(b), Put(b), 
Put(a), Delete(a), Put(b), Put(a), for reader(a) we can read the lated Put(a) 
and for reader(b) we  can read two Puts and the oldest one is masked, right?

For mvcc-sensitive semantics, we have VERSION_MASKED which means this Put is 
masked by enough number of versions and can never be read. So for 
mvcc-sensitive with visibility labels. If the order is  Put(b), Put(b), Put(a), 
Delete(a), Put(b), Put(a). What should be the result of reader(b)? The latest 
two Put(b) because the third put is deleted so the second will not be masked, 
or only the latest Put(b) because we can not see the Delete(a)? 

[~anoop.hbase] [~ram_krish] [~Apache9] What do you think? Thanks.


was (Author: yangzhe1991):
After reading code introduced by HBASE-7663 and HBASE-10885, if I am not wrong, 
visibility labels will not change the logic of versions. If the maxVersion is 
3, we can only read first three versions and check if it can be seen according 
to tags. For delete, we can set tags for it and it only mask Put with same 
tags(no tag is also a kind of tag). For example, if we have Put(b), Put(b), 
Put(a), Delete(a), Put(b), Put(a), for reader(a) we can read the lated Put(a) 
and for reader(b) we  can read two Puts and the oldest one is masked, right?

For mvcc-sensitive semantics, we have VERSION_MASKED which means this Put is 
masked by enough number of versions and can never be read. So for 
mvcc-sensitive with visibility labels. If the order is  Put(b), Put(b), Put(a), 
Delete(a), Put(b), Put(a). What should be the result of reader(b)? The latest 
two Put(b) because the third put is deleted so the second will not be masked, 
or only the latest Put(b) because we can not see the Delete(a)? 

[~anoop.hbase][~ram_krish][~Apache9] What do you think? Thanks.

> MVCC-sensitive semantics of versions
> 
>
> Key: HBASE-15968
> URL: https://issues.apache.org/jira/browse/HBASE-15968
> Project: HBase
>  Issue Type: New Feature
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-15968-v1.patch
>
>
> In HBase book, we have a section in Versions called "Current Limitations" see 
> http://hbase.apache.org/book.html#_current_limitations
> {quote}
> 28.3. Current Limitations
> 28.3.1. Deletes mask Puts
> Deletes mask puts, even puts that happened after the delete was entered. See 
> HBASE-2256. Remember that a delete writes a tombstone, which only disappears 
> after then next major compaction has run. Suppose you do a delete of 
> everything ⇐ T. After this you do a new put with a timestamp ⇐ T. This put, 
> even if it happened after the delete, will be masked by the delete tombstone. 
> Performing the put will not fail, but when you do a get you will notice the 
> put did have no effect. It will start working again after the major 
> compaction has run. These issues should not be a problem if you use 
> always-increasing versions for new puts to a row. But they can occur even if 
> you do not care about time: just do delete and put immediately after each 
> other, and there is some chance they happen within the same millisecond.
> 28.3.2. Major compactions change query results
> …​create three cell versions at t1, t2 and t3, with a maximum-versions 
> setting of 2. So when getting all versions, only the values at t2 and t3 will 
> be returned. But if you delete the version at t2 or t3, the one at t1 will 
> appear again. Obviously, once a major compaction has run, such behavior will 
> not be the case anymore…​ (See Garbage Collection in Bending time in HBase.)
> {quote}
> These limitations result from the current implementation on multi-versions: 
> we only consider timestamp, no matter when it comes; we will not remove old 
> version immediately if there are enough number of new versions. 
> So we can get a stronger semantics of versions by two guarantees:
> 1, Delete will not mask Put that comes after it.
> 2, If a version is masked by enough number of higher versions (VERSIONS in 
> cf's conf), it will never be seen any more.
> Some examples for understanding:
> (delete t<=3 means use Delete.addColumns to delete all versions whose ts is 
> not greater than 3, and delete t3 means use Delete.addColumn to delete the 
> version whose ts=3)
> case 

[jira] [Comment Edited] (HBASE-15968) MVCC-sensitive semantics of versions

2016-09-13 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486867#comment-15486867
 ] 

Phil Yang edited comment on HBASE-15968 at 9/13/16 10:24 AM:
-

After reading code introduced by HBASE-7663 and HBASE-10885, if I am not wrong, 
visibility labels will not change the logic of versions. If the maxVersion is 
3, we can only read first three versions and check if it can be seen according 
to tags. For delete, we can set tags for it and it only mask Put with same 
tags(no tag is also a kind of tag).

For mvcc-sensitive semantics, we have VERSION_MASKED which means this Put is 
masked by enough number of versions and can never be read. So for 
mvcc-sensitive with visibility labels. If we have 4 Puts whose tags are a, a, 
b, a. The first one will never be read no mater what labels the reader has. But 
if we delete the third Put after we put it. The order is Put(a), Put(a), 
Put(b), Delete(b), Put(a). What should be the result of reader if its label is 
a? The latest two Put(a) because we can not see Delete(b), or all three of them 
because the Put(b) has been deleted although we can not see them? 

[~anoop.hbase][~ram_krish][~Apache9] What do you think? Thanks.


was (Author: yangzhe1991):
After reading code introduced by HBASE-7663 and HBASE-10885, if I am not wrong, 
visibility labels will not change the logic of versions. If the maxVersion is 
3, we can only read first three versions and check if it can be seen according 
to tags. For delete, we can set tags for it and it only mask Put with same 
tags(no tag is also a kind of tag).

For mvcc-sensitive semantics, we have VERSION_MASKED which means this Put is 
masked by enough number of versions and can never be read. So for 
mvcc-sensitive with visibility labels. If we have 4 Puts whose tags are a, a, 
b, a. The first one will never be read no mater what labels the reader has. But 
if we delete the third Put after we put it. The order is Put(a), Put(a), 
Put(b), Delete(b), Put(a). What should be the result of reader if its label is 
a? The latest two Put(a), or all three of them? 

[~anoop.hbase][~ram_krish][~Apache9] What do you think? Thanks.

> MVCC-sensitive semantics of versions
> 
>
> Key: HBASE-15968
> URL: https://issues.apache.org/jira/browse/HBASE-15968
> Project: HBase
>  Issue Type: New Feature
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-15968-v1.patch
>
>
> In HBase book, we have a section in Versions called "Current Limitations" see 
> http://hbase.apache.org/book.html#_current_limitations
> {quote}
> 28.3. Current Limitations
> 28.3.1. Deletes mask Puts
> Deletes mask puts, even puts that happened after the delete was entered. See 
> HBASE-2256. Remember that a delete writes a tombstone, which only disappears 
> after then next major compaction has run. Suppose you do a delete of 
> everything ⇐ T. After this you do a new put with a timestamp ⇐ T. This put, 
> even if it happened after the delete, will be masked by the delete tombstone. 
> Performing the put will not fail, but when you do a get you will notice the 
> put did have no effect. It will start working again after the major 
> compaction has run. These issues should not be a problem if you use 
> always-increasing versions for new puts to a row. But they can occur even if 
> you do not care about time: just do delete and put immediately after each 
> other, and there is some chance they happen within the same millisecond.
> 28.3.2. Major compactions change query results
> …​create three cell versions at t1, t2 and t3, with a maximum-versions 
> setting of 2. So when getting all versions, only the values at t2 and t3 will 
> be returned. But if you delete the version at t2 or t3, the one at t1 will 
> appear again. Obviously, once a major compaction has run, such behavior will 
> not be the case anymore…​ (See Garbage Collection in Bending time in HBase.)
> {quote}
> These limitations result from the current implementation on multi-versions: 
> we only consider timestamp, no matter when it comes; we will not remove old 
> version immediately if there are enough number of new versions. 
> So we can get a stronger semantics of versions by two guarantees:
> 1, Delete will not mask Put that comes after it.
> 2, If a version is masked by enough number of higher versions (VERSIONS in 
> cf's conf), it will never be seen any more.
> Some examples for understanding:
> (delete t<=3 means use Delete.addColumns to delete all versions whose ts is 
> not greater than 3, and delete t3 means use Delete.addColumn to delete the 
> version whose ts=3)
> case 1: put t2 -> put t3 -> delete t<=3 -> put t1, and we will get t1 because 
> the put is after delete.
> case 2: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3, and we will 

[jira] [Comment Edited] (HBASE-15968) MVCC-sensitive semantics of versions

2016-09-06 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15467105#comment-15467105
 ] 

Phil Yang edited comment on HBASE-15968 at 9/6/16 10:52 AM:


First I change the name to mvcc-sensitive, I think it may be better than before 
:)

The logic is different from the initial design doc, so I removed the link. Now 
the logic is much simpler. I use a MvccSensitiveTracker implementing both 
ColumnTracker and DeleteTracker to track delete and versions. In the tracker, 
we should judge if a Put is deleted by delete-marker with higher mvcc(and of 
course, same or higher timestamp) , or "masked"(same as deleted) by enough 
number of Put with higher timestamp. The logic of ScanQueryMatcher is not 
changed except minor compaction. In minor compaction we can not drop anything 
because we only see partial cells.

And we can not set mvcc to 0 while compacting.

Users can set MVCC_SENSITIVE to "true" in CF's configuration to enable this 
logic, and REPLICATION_SCOPE must be set to 2 if need being pushed to a slave 
peer with this feature on(See HBASE-9465), because the order of write is 
meaningful.

Any comments are welcomed, thanks!


was (Author: yangzhe1991):
First I change the name to mvcc-sensitive, I think it may be better than before 
:)

The logic is different from the initial design doc, so I removed the link. Now 
the logic is much simpler. I use a MvccSensitiveTracker implementing both 
ColumnTracker and DeleteTracker to track delete and versions. In the tracker, 
we should judge if a Put is deleted by delete-marker with higher mvcc(and of 
course, same or higher timestamp) , or "masked"(same as deleted) by enough 
number of Put with higher timestamp. The logic of ScanQueryMatcher is not 
changed except minor compaction. In minor compaction we can not drop anything 
because we only see partial cells.

And we can not set mvcc to 0 while compacting.

Users can set MVCC_SENSITIVE to "true" in CF's configuration to enable this 
logic, and REPLICATION_SCOPE must be set to 2 if enable(See HBASE-9465), 
because the order of write is meaningful.

Any comments are welcomed, thanks!

> MVCC-sensitive semantics of versions
> 
>
> Key: HBASE-15968
> URL: https://issues.apache.org/jira/browse/HBASE-15968
> Project: HBase
>  Issue Type: New Feature
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-15968-v1.patch
>
>
> In HBase book, we have a section in Versions called "Current Limitations" see 
> http://hbase.apache.org/book.html#_current_limitations
> {quote}
> 28.3. Current Limitations
> 28.3.1. Deletes mask Puts
> Deletes mask puts, even puts that happened after the delete was entered. See 
> HBASE-2256. Remember that a delete writes a tombstone, which only disappears 
> after then next major compaction has run. Suppose you do a delete of 
> everything ⇐ T. After this you do a new put with a timestamp ⇐ T. This put, 
> even if it happened after the delete, will be masked by the delete tombstone. 
> Performing the put will not fail, but when you do a get you will notice the 
> put did have no effect. It will start working again after the major 
> compaction has run. These issues should not be a problem if you use 
> always-increasing versions for new puts to a row. But they can occur even if 
> you do not care about time: just do delete and put immediately after each 
> other, and there is some chance they happen within the same millisecond.
> 28.3.2. Major compactions change query results
> …​create three cell versions at t1, t2 and t3, with a maximum-versions 
> setting of 2. So when getting all versions, only the values at t2 and t3 will 
> be returned. But if you delete the version at t2 or t3, the one at t1 will 
> appear again. Obviously, once a major compaction has run, such behavior will 
> not be the case anymore…​ (See Garbage Collection in Bending time in HBase.)
> {quote}
> These limitations result from the current implementation on multi-versions: 
> we only consider timestamp, no matter when it comes; we will not remove old 
> version immediately if there are enough number of new versions. 
> So we can get a stronger semantics of versions by two guarantees:
> 1, Delete will not mask Put that comes after it.
> 2, If a version is masked by enough number of higher versions (VERSIONS in 
> cf's conf), it will never be seen any more.
> Some examples for understanding:
> (delete t<=3 means use Delete.addColumns to delete all versions whose ts is 
> not greater than 3, and delete t3 means use Delete.addColumn to delete the 
> version whose ts=3)
> case 1: put t2 -> put t3 -> delete t<=3 -> put t1, and we will get t1 because 
> the put is after delete.
> case 2: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3, and we will 
> always get t2 no