[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2014-01-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863531#comment-13863531
 ] 

Enis Soztutar commented on HBASE-10241:
---

We need to fix this for a couple of different reasons: 
 - Fixing scanner consistency with multi-row transactions (see HBASE-9797) 
 - Adding cell-based scanners, and streaming scans 
 - Adding single-row scanners. 
 - Consistent scanners with region replicas in case replicas are mostly up to 
date (HBASE-10070) 

What is the plan here? I think we should do subtasks 1 and 3 regardless of 
HBASE-8763. But it seems that if we do HBASE-8763 first, it will be much 
cleaner and we won't need subtask 2 at all. 


 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2014-01-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863546#comment-13863546
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

Subtask 2 is trivial; the reason it is not done is that it is being done 
elsewhere (see how it's resolved as dup), so since it's not blocking us here 
and now it doesn't make sense to do double work. I am not working on this jira 
right now (will get back to it hopefully and there's a patch out in client 
subtask), but the plan was that I will do 1 and 3, and then take 2 if the other 
JIRA that does 2 is not done by then. 
HBASE-8763 does not need to block this, it's probably bigger than this entire 
JIRA. If it's done before this due to delays, good, if not, also good :)


 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2014-01-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863552#comment-13863552
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

Thanks for HBASE-9797 reference, yes, it is good for that!

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2014-01-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863590#comment-13863590
 ] 

Enis Soztutar commented on HBASE-10241:
---

bq. but the plan was that I will do 1 and 3, and then take 2 if the other JIRA 
that does 2 is not done by then. 
Sounds good. I thought HBASE-8721 is won't fix. 
bq. HBASE-8763 does not need to block this, it's probably bigger than this 
entire JIRA
Indeed. But it will be a shame if we add mvcc's to WAL only to remove them 
again after HBASE-8763.

BTW, I think we also have to handle mvcc / seqId as a part of the serialization 
in the KV byte array. Do we have any open issues for that? 




 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2014-01-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863615#comment-13863615
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

There's another issue,  HBASE-10227 for the WAL stuff.
Mvcc can already be serialized with KV in HFile. Comment in KeyValue.java is a 
lie :)

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2014-01-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863651#comment-13863651
 ] 

Enis Soztutar commented on HBASE-10241:
---

bq. Mvcc can already be serialized with KV in HFile. Comment in KeyValue.java 
is a lie
Sorry, I am not talking about mvcc serialization in hfile. I was talking about 
making mvcc number a part of the byte[] in KV. 

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2014-01-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863654#comment-13863654
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

That would be PITA from backward compat perspective - we'd both add a field, 
requiring HFileFormat v4 (don't really want tag overhead for this), and 
presumably (tags or not) remove the old magic mechanism

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2014-01-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863825#comment-13863825
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

I will make a separate unrelated JIRA for this

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2014-01-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863829#comment-13863829
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

 HBASE-10288

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2014-01-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860996#comment-13860996
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

final client-side patch (that works across region re-open, split and merge as 
long as you trick existing server into preserving MVCC) is attached to 
HBASE-10242

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2013-12-28 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858234#comment-13858234
 ] 

Feng Honghua commented on HBASE-10241:
--

I also encountered this issue when implementing JIRA-8721, the fix for deletes 
can mask puts that happen after the delete, and already persisted mvcc in WAL, 
and then they can be used to recover region's correct mvcc during re-opening. 
Anyone who has interest can refer it :-)

Seems setting mvcc (per hfile) to zero for (minor, arguably) performance 
benefit can't offset the correctness penalty it brings. Persisting mvcc and 
survive them across regionservers is a matter of semantic correctness, seems 
most related issues can be resolved by making this correct, combining mvcc and 
seqid is not as critical as this correcting.

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2013-12-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857563#comment-13857563
 ] 

stack commented on HBASE-10241:
---

[~sershe] Nice writeup.

sequenceid is not scoped to the region. Ergo mvcc should be too?

bq. Client will have a configurable setting to enable consistent scanners.

IMO, just say no to more configs.  When would someone want 'inconsistent 
scanners'?  (In my experience, users do not play with configs in 98.1745% of 
cases -- smile).

bq. unless mvcc and seqId are merged)

Would this issue be better if above work is done first?

bq. ...Recovery will have to make use of mvcc when replaying the edit to new 
server.

This work is ongoing over in @jeffrey zhong replay wal effort?

mvcc is about giving you a consistent view on a row only.  This work is to deal 
with the case where you have a wide row and you have already passed the client 
the first half of a row, a crash happens, and you need to return to the client 
the second half of the row?

A section in doc. on implications of not having this change fixed and then the 
'cost' of this fix going in would help.

Thanks for working on this tough one [~sershe]

[~lhofhansl] One for you to peruse boss...


 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2013-12-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857600#comment-13857600
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

bq. sequenceid is not scoped to the region. Ergo mvcc should be too?
That is not necessary, strictly speaking.
bq. IMO, just say no to more configs. When would someone want 'inconsistent 
scanners'? (In my experience, users do not play with configs in 98.1745% of 
cases – smile).
That is ok, but in 1.8265% cases when you do need config you normally really 
need it :)
It can be on by default, but at least for the first version the new 
functionality should be easy to disable if needed.

{quote}
bq.unless mvcc and seqId are merged)
Would this issue be better if above work is done first?
{quote}
Not much. After examining stuff I'd say it almost doesn't matter. Except maybe 
some more work adding mvcc to WAL, but that's not much.

...Recovery will have to make use of mvcc when replaying the edit to new 
server.

bq. This work is ongoing over in @jeffrey zhong replay wal effort?
It will have to go into both that and old replay I guess, similar to nonces. 
There's also HBASE-10227, which is related.

bq. mvcc is about giving you a consistent view on a row only. This work is to 
deal with the case where you have a wide row and you have already passed the 
client the first half of a row, a crash happens, and you need to return to the 
client the second half of the row?
Actually, mvcc currently gives you a consistent view of the region, in theory 
(unless there's some glitch in code). All updates in the region are 
mvcc-ordered.

bq. A section in doc. on implications of not having this change fixed and then 
the 'cost' of this fix going in would help.
Will add. What do you mean by cost? 

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2013-12-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857607#comment-13857607
 ] 

stack commented on HBASE-10241:
---

+1 on it being on by default and config is only to turn it off if problematic.

On mvcc giving consistent view on region, that is unnecessary, right  -- when 
would we ever care about a consistent view across a region rather than just 
across a row (other than the fact that row boundaries are only known after the 
fact, after you have passed them out) -- so if that the case, we should make do 
w/ the lesser scope if we can.

When I say cost, I mean the amount of extra checks and work the server will 
have to do to enforce this consistency (the fancy dancing necessary enforcing 
the period during which a scanner may come in post recover of a server  -- will 
we suspend compactions during this time or will compactions have a special flag 
which says do not drop mvcc  so no full compactions during this period -- 
and so on).

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2013-12-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857617#comment-13857617
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

bq. On mvcc giving consistent view on region, that is unnecessary, right – when 
would we ever care about a consistent view across a region rather than just 
across a row (other than the fact that row boundaries are only known after the 
fact, after you have passed them out)
It can actually be pretty important... if recovery takes a while and scanner 
bounces the data read can be several  minutes apart. For certain use cases it's 
much better to have consistent data for close rows (esp. if some sharded data 
is stored). Also, if secondary reads are implemented the divergence between 
scanners can be even greater, so the negative effects of scanner jumping will 
be even more visible. Then, as suggested above, by querying mvcc from all 
requisite regions before scanner runs we can make it even more reasonable.
Then it becomes as close as you can get to consistent view of the data without 
implementing something like Percolator, with external timestamps. Which is 
pretty neat :)


 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2013-12-27 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857696#comment-13857696
 ] 

Lars Hofhansl commented on HBASE-10241:
---

Nice writeup. It would be better even if it would state the failure scenario 
we're guarding against.

I think we have to careful not to add any half guarantees. Currently HBase 
does MVCC based SI for rows (and serializable views for increment/append). If I 
understand this correctly, this is not needed for row consistency, right?
While recovery is in progress we cannot scan, so we'd never see partial rows, 
right?

Then what exactly are we guarding against? (As usually, I might just miss an 
important point :) )


 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2013-12-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857745#comment-13857745
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

Region-level inconsistency, essentially.

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2013-12-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857911#comment-13857911
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

WIP patch attached to child jira

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)

2013-12-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857348#comment-13857348
 ] 

Sergey Shelukhin commented on HBASE-10241:
--

one-pager doesn't cover merges and splits. Looks like mvccs on client will have 
to be tracked per range, rather than region.
For even more consistent scanner, mvccs might even optionally be pre-fetched 
from all regions, but that is for later

 implement mvcc-consistent scanners (across recovery)
 

 Key: HBASE-10241
 URL: https://issues.apache.org/jira/browse/HBASE-10241
 Project: HBase
  Issue Type: New Feature
  Components: HFile, regionserver, Scanners
Affects Versions: 0.99.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Consistent scanners.pdf


 Scanners currently use mvcc for consistency. However, mvcc is lost on server 
 restart, or even a region move. This JIRA is to enable the scanners to 
 transfer mvcc (or seqId, or some other number, see HBASE-8763) between 
 servers. First, client scanner needs to get and store the readpoint. Second, 
 mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in 
 store files per KV and discarded when not needed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)