[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-08-02 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391685#comment-17391685
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

I have HBASE-25975 working reasonably well in a test environment and unit tests 
are passing. Will collect macro benchmark results and report back. 

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-06-07 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358964#comment-17358964
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

{quote}
This is an interesting idea, trying to think about it a bit more. IIUC the idea 
here is to maximize the commit throughput as much as possible in a single tick 
by picking from the available pool of disjoint row keys that can share the same 
tick. Let me take a look at the patch..
{quote}
This is the HBASE-25975 subtask. I still need to add a test that proves it even 
works. :-) You might want to come back on that subtask when I post an update 
there. 

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-06-07 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358962#comment-17358962
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

{quote}
bq. These implementations were microbenchmarked to identify the superior 
option, which seems to be BoundedIncrementYieldAdvancingClock.
Kind of surprised. I guessed it would be IncrementAdvancingClock theoretically 
(since there is no overhead to maintain boundedness), still trying to wrap my 
head around the subtle details in the patch.
{quote}
My mistake, that's a typo. Let me fix it. 

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-06-07 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358951#comment-17358951
 ] 

Bharath Vissapragada commented on HBASE-24440:
--

Thanks for the detailed results.

> These implementations were microbenchmarked to identify the superior option, 
> which seems to be BoundedIncrementYieldAdvancingClock.

Kind of surprised. I guessed it would be IncrementAdvancingClock theoretically 
(since there is no overhead to maintain boundedness), still trying to wrap my 
head around the subtle details in the patch.

> We are tracking whether or not a row has already been committed to in the 
> current clock tick. Or, we are tracking per region a set of rows per clock 
> tick. Why not track that explicitly?

This is an interesting idea, trying to think about it a bit more. IIUC the idea 
here is to maximize the commit throughput as much as possible in a single tick 
by picking from the available pool of disjoint row keys that can share the same 
tick. Let me take a look at the patch..

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-06-06 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358203#comment-17358203
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

In HBASE-25913 I considered serialization of commits within a given clock tick 
using the system clock, by extending the interface EnvironmentEdge, which 
already dealt with time. Consider an API for getting a Clock, by name, from a 
concurrent map of such things, and then using this clock to get _advancing 
time_, which is the current system time, like System.currentTimeMillis, but 
with an additional semantic: When you call getCurrentAdvancing() you will 
always get a value that has advanced forward from the last time someone looked 
at this particular clock's time. The particular implementation of Clock is free 
to maintain that invariant with different strategies, and I explored several 
different implementations:
- IncrementAdvancingClock: like a poor man's hybrid clock, if the system time 
hasn't advanced, manually advance it, and ensure other API calls to the same 
clock don't give a different, earlier time
- SpinAdvancingClock: if the system time hasn't advanced, spin until it does
- YieldAdvancingClock: if the system time hasn't advanced, yield the thread 
with a small sleep until it does
- BoundedIncrementSpinAdvancingClock: advance ahead of system time up to some 
bound (like 1 second) but then spin until the system time catches up
- BoundedIncrementYieldAdvancingClock: advance ahead of system time up to some 
bound (like 1 second) but then yield with a small sleep until the system time 
catches up
And these implementations were microbenchmarked to identify the superior 
option, which seems to be BoundedIncrementYieldAdvancingClock. 

The main issue with HBASE-25913 is serializing commits with the act of getting 
the time requires that everything that might mutate the resource you are trying 
to protect must be within that scope. For a single row update the scope would 
be a row, and highly granular, so performance will not be affected that much. 
For batch mutations, which are typical, the scope is the region, and 
serializing commits to a region is expensive. With 
BoundedIncrementYieldAdvancingClock that means the region can take a burst of 
commits within a single clock tick up to the limit and then there is a 
performance discontinuity while we spin until the system time catches up (which 
can be up to 1 second). With YieldAdvancingClock, the region can take only one 
commit for an entire clock tick, which is way worse. This should be managed at 
the row scope, not the region scope, but then what to do about batch mutations? 
I struggled for a while with various forms of hierarchical clocks where one 
might get a clock for the region, then use that clock to get child clocks for a 
row, and was not successful because they always ended up serializing 
_something_ at region scope, but then reconsidered... What is it we are really 
tracking? We are tracking whether or not a row has already been committed to in 
the current clock tick. Or, _we are tracking per region a set of rows per clock 
tick_. Why not track that explicitly? So this idea is being explored with 
HBASE-25975, which does that with a region level data structure based on 
atomics because access to it will be highly multithreaded. In some ways it is 
much simpler in retrospect, but may become more complex when considering 
optimizations. 

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge 

[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-05-24 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350761#comment-17350761
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

One early A/B test worth doing is A=spin wait, B=thread yield(). 

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-05-24 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350759#comment-17350759
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

That worst case could be possible but it’s more likely there will be an 
interleaving of handlers as they yield. Yielding may not be necessary at all if 
the clock has already ticked over, which should be pretty likely on a typical 
Linux server. I say we start simple and then consider clever if and only if 
there’s a real issue revealed in the performance data. 

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-05-24 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350742#comment-17350742
 ] 

Bharath Vissapragada commented on HBASE-24440:
--

Ok, looking forward to the performance analysis. The reason I thought it could 
be a concern is because if 16 RPC threads all call Region#append() within the 
same tick, in the worst case the last append can wait for 16 ticks (or I 
misunderstood something). Anyway agree this can wait until we have actual data 
and the patch.

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-05-24 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350736#comment-17350736
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

Also as I said for batches the advancing time is taken only once. If your 
metrics ingesting application is fine with approximate time stamps then it is 
obviously fine with submitting mutations in batch where all the cells in the 
batch will get the same stamp (this is covered in the issue description) and 
there will be exactly one call to getTimeAdvancing for the entire batch so I 
wonder if the difference can be reliably measured it should be so small. 

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-05-24 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350729#comment-17350729
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

No this will not be pluggable. It is a new invariant, as proposed. We have a 
ton of configuration settings as is and especially for something fundamental 
like this the confusion factor will be amplified. 

Because everyone will be using EnvironmentEdge to get current time it is 
trivial to track last retrieved time. We will spin if and only if 
getTimeAdvancing is called. That will be called only one time per mutation 
request handling. The overhead is expected to be hard to measure. (But we will 
measure it.)

If the overhead turns out to be larger than expected then we can have a 
discussion about optimization. Otherwise that discussion is premature. 

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-05-24 Thread Bharath Vissapragada (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350726#comment-17350726
 ] 

Bharath Vissapragada commented on HBASE-24440:
--

> new EnvironmentEdge#currentTimeAdvancing which ensures that when the current 
> time is returned, it is the current time in a different clock tick from the 
> last time the EnvironmentEdge was used to get the current time.

Curious how you plan to achieve this, is the plan to sleep for a clock tick 
between two consecutive invocations (if current_ts == last_read_ts) or 
something more fancy?

Is the plan to make this pluggable at a table/namespace scope or for the 
service? Reason I ask this because the performance penalty with delayed clock 
ticks may not be acceptable for throughput favoring applications like metrics 
ingestion that are fine with approximate timestamps. Also per your design, it 
appears the scope of uniqueness is RS so we have a total order of all mutations 
for a given RS (across regions). In most cases we are ok with a partial order 
(within a region) since that is where the order of mutations matters, so we can 
have optimizations like per region clock instances that guarantee this partial 
order and also achieves the pluggability part (thinking out loud). WDYT.

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2021-05-24 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350711#comment-17350711
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

I'm considering something simple:
 * Ensure our discipline in using {{EnvironmentEdge#currentTime}} instead of 
{{System#currentTimeMillis}} (HBASE-25912)
 * Fix current issues (HBASE-25911)
 * Introduce new {{EnvironmentEdge#currentTimeAdvancing}} which ensures that 
when the current time is returned, it is the current time in a different clock 
tick from the last time the {{EnvironmentEdge}} was used to get the current 
time. 
 * Use {{EnvironmentEdge#currentTimeAdvancing}} wherever we go to substitute a 
{{Long.MAX_VALUE}} timestamp placeholder with a real placeholder. When 
processing a batch mutation we will call {{currentTimeAdvancing}} only once. 
This means the client cannot bundle cells with wildcard timestamps into a batch 
where those cells must be committed with different timestamps. Clients must 
simply not submit mutations that must be committed with guaranteed distinct 
timestamps in the same batch. Easy to understand, easy to document, and it 
aligns with our design philosophy of the client knows best. It will be fine to 
continue to use {{EnvironmentEdge#currentTime}} everywhere else. In this way we 
will only potentially spin wait where it matters, and won't suffer serious 
overheads during batch processing.

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2020-06-01 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121293#comment-17121293
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

I am aware. If we do this I don’t think we will need it at all, configurable or 
not. But that is out of scope for this issue. 

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2020-06-01 Thread Geoffrey Jacoby (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121247#comment-17121247
 ] 

Geoffrey Jacoby commented on HBASE-24440:
-

[~apurtell] - in HBase 2.x and above, the sort-delete-before-put rule is 
configurable. (see 29.3 in the HBase book). It can be disabled at the cost of 
some CPU perf on read. 

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2020-06-01 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121223#comment-17121223
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

Correct [~anoop.hbase] , two versions with two distinct timestamps instead 
of duplicate row keys with only something like an internal only seqno to 
differentiate them (which is not replicated).

We can also consider removing the implicit sort-delete-before-put rule that can 
cause temporal anomalies under some conditions, but that is out of scope for 
this proposal.

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2020-05-31 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120737#comment-17120737
 ] 

Anoop Sam John commented on HBASE-24440:


When 2 such mutate reqs comes and applied at same TS, need to keep both of 
these as 2 versions. That is the need right?  Right now it will be treated as 
duplicate entry and the one with higher seqNo will come out of the scan.

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. One option is to keep track of the last locked time 
> for row locks. (Todo: How would we track and garbage collect this efficiently 
> and correctly. Not the ideal option.) We might also do this tracking somehow 
> via the memstore. (At least in this case the lifetime and distribution of in 
> memory row state, including the proposed timestamps, would align.) Assuming 
> we can efficiently know if we are about to commit twice to the same row 
> within a single clock tick, we would simply sleep/yield the current thread 
> until the clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2020-05-26 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116947#comment-17116947
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

bq. There are edge cases where it is at least theoretically possible for more 
than one mutation to be committed to a given row within the same clock tick.

Clients might manually assign timestamps to cause the same type of anomalies. 
This proposal excludes such cases. In theory we could rewrite the client 
supplied timestamp, perhaps a policy of increment by one when processing each 
mutation in the batch, but I think this goes a step too far. Unless the client 
specifically tells us to change the timestamp, we should not. On the other 
hand, when we detect this condition, we can and should warn about it in the 
logs. 

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. We would do this somehow via the memstore. Assuming we 
> can efficiently know if we are about to commit twice to the same row within a 
> single clock tick, we would simply sleep/yield the current thread until the 
> clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2020-05-26 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116934#comment-17116934
 ] 

Andrew Kyle Purtell commented on HBASE-24440:
-

Note this is a trick Apache Phoenix uses to ensure uniqueness of timestamps for 
indexes.

> Prevent temporal misordering on timescales smaller than one clock tick
> --
>
> Key: HBASE-24440
> URL: https://issues.apache.org/jira/browse/HBASE-24440
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> When mutations are sent to the servers without a timestamp explicitly 
> assigned by the client the server will substitute the current wall clock 
> time. There are edge cases where it is at least theoretically possible for 
> more than one mutation to be committed to a given row within the same clock 
> tick. When this happens we have to track and preserve the ordering of these 
> mutations in some other way besides the timestamp component of the key. Let 
> me bypass most discussion here by noting that whether we do this or not, we 
> do not pass such ordering information in the cross cluster replication 
> protocol. We also have interesting edge cases regarding key type precedence 
> when mutations arrive "simultaneously": we sort deletes ahead of puts. This, 
> especially in the presence of replication, can lead to visible anomalies for 
> clients able to interact with both source and sink. 
> There is a simple solution that removes the possibility that these edge cases 
> can occur: 
> We can detect, when we are about to commit a mutation to a row, if we have 
> already committed a mutation to this same row in the current clock tick. 
> Occurrences of this condition will be rare. We are already tracking current 
> time. We have to know this in order to assign the timestamp. Where this 
> becomes interesting is how we might track the last commit time per row. 
> Making the detection of this case efficient for the normal code path is the 
> bulk of the challenge. We would do this somehow via the memstore. Assuming we 
> can efficiently know if we are about to commit twice to the same row within a 
> single clock tick, we would simply sleep/yield the current thread until the 
> clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)