subject:"\[jira\] \[Commented\] \(HBASE\-16820\) BulkLoad mvcc visibility only works accidentally"

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

2016-12-12 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743390#comment-15743390
 ] 

Enis Soztutar commented on HBASE-16820:
---

Hey Nick, sorry the problem has been fixed already via an addendum in 
HBASE-16721 for 1.1. See my comment on 
https://issues.apache.org/jira/browse/HBASE-16721?focusedCommentId=15570346=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15570346.
 

This issue is for a longer term fix, should not be a blocker for the release. 

> BulkLoad mvcc visibility only works accidentally 
> -
>
> Key: HBASE-16820
> URL: https://issues.apache.org/jira/browse/HBASE-16820
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.8
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Blocker
> Fix For: 1.1.8
>
> Attachments: HBASE-16820-branch-1.1-v0.patch
>
>
> [~sergey.soldatov] has been debugging an issue with a 1.1 code base where the 
> commit for HBASE-16721 broke the bulk load visibility. After bulk load, the 
> bulk load files is not visible because the sequence id assigned to the bulk 
> load is not advanced in mvcc. 
> Debugging further, we have noticed that bulk load behavior is wrong, but it 
> works "accidentally" in all code bases (but broken in 1.1 after HBASE-16721). 
> Let me explain: 
>  - BL request can optionally request a flush before hand (this should be the 
> default) which causes the flush to happen with some sequenceId. The flush 
> sequence id is one past all the cells' sequenceids. This flush sequence id is 
> returned as a result to the flush operation. 
>  - BL then uses this particular sequenceId to mark the files, but itself does 
> not get a new sequenceid of its own, or advance the mvcc number. 
>  - BL completes WITHOUT making sure that the sequence id is visible. 
>  - BL itself though writes entries to the WAL for the BL event, which in 1.2 
> code bases goes through the whole mvcc + seqId paths, which makes sure that 
> earlier sequenceIds (the flush sequenceId) are visible via mvcc. 
> The problem with 1.1 is that the WAL entries only get sequence ids, but do 
> not touch mvcc. With the patch for HBASE-16721, we have made it so that the 
> flushedSequenceId is not used in mvcc as the highest read point (although all 
> the data is still visible).
> BL relying on the flush sequence id is wrong for two reasons: 
>  - BL files are loaded with the flush sequence id from the memstore. This 
> particular sequence id is used twice for two different things and ends up 
> being the sequence id for flushed file as well as BL'ed files. 
>  - BL should make sure that it gets a new sequence id and that sequence id is 
> visible before returning the results. 
> [~ndimiduk] FYI. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

2016-12-10 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15738785#comment-15738785
 ] 

Nick Dimiduk commented on HBASE-16820:
--

Bump here again. I'm not comfortable shipping a 1.1.8 on which bulkload does 
not work. Either we get a fix here or revert HBASE-16721.

[~enis] [~sergey.soldatov], what say you?

> BulkLoad mvcc visibility only works accidentally 
> -
>
> Key: HBASE-16820
> URL: https://issues.apache.org/jira/browse/HBASE-16820
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.8
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Blocker
> Fix For: 1.1.8
>
> Attachments: HBASE-16820-branch-1.1-v0.patch
>
>
> [~sergey.soldatov] has been debugging an issue with a 1.1 code base where the 
> commit for HBASE-16721 broke the bulk load visibility. After bulk load, the 
> bulk load files is not visible because the sequence id assigned to the bulk 
> load is not advanced in mvcc. 
> Debugging further, we have noticed that bulk load behavior is wrong, but it 
> works "accidentally" in all code bases (but broken in 1.1 after HBASE-16721). 
> Let me explain: 
>  - BL request can optionally request a flush before hand (this should be the 
> default) which causes the flush to happen with some sequenceId. The flush 
> sequence id is one past all the cells' sequenceids. This flush sequence id is 
> returned as a result to the flush operation. 
>  - BL then uses this particular sequenceId to mark the files, but itself does 
> not get a new sequenceid of its own, or advance the mvcc number. 
>  - BL completes WITHOUT making sure that the sequence id is visible. 
>  - BL itself though writes entries to the WAL for the BL event, which in 1.2 
> code bases goes through the whole mvcc + seqId paths, which makes sure that 
> earlier sequenceIds (the flush sequenceId) are visible via mvcc. 
> The problem with 1.1 is that the WAL entries only get sequence ids, but do 
> not touch mvcc. With the patch for HBASE-16721, we have made it so that the 
> flushedSequenceId is not used in mvcc as the highest read point (although all 
> the data is still visible).
> BL relying on the flush sequence id is wrong for two reasons: 
>  - BL files are loaded with the flush sequence id from the memstore. This 
> particular sequence id is used twice for two different things and ends up 
> being the sequence id for flushed file as well as BL'ed files. 
>  - BL should make sure that it gets a new sequence id and that sequence id is 
> visible before returning the results. 
> [~ndimiduk] FYI. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

2016-12-04 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720526#comment-15720526
 ] 

Nick Dimiduk commented on HBASE-16820:
--

Bump. Any eyes here?

> BulkLoad mvcc visibility only works accidentally 
> -
>
> Key: HBASE-16820
> URL: https://issues.apache.org/jira/browse/HBASE-16820
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.8
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Blocker
> Fix For: 1.1.8
>
> Attachments: HBASE-16820-branch-1.1-v0.patch
>
>
> [~sergey.soldatov] has been debugging an issue with a 1.1 code base where the 
> commit for HBASE-16721 broke the bulk load visibility. After bulk load, the 
> bulk load files is not visible because the sequence id assigned to the bulk 
> load is not advanced in mvcc. 
> Debugging further, we have noticed that bulk load behavior is wrong, but it 
> works "accidentally" in all code bases (but broken in 1.1 after HBASE-16721). 
> Let me explain: 
>  - BL request can optionally request a flush before hand (this should be the 
> default) which causes the flush to happen with some sequenceId. The flush 
> sequence id is one past all the cells' sequenceids. This flush sequence id is 
> returned as a result to the flush operation. 
>  - BL then uses this particular sequenceId to mark the files, but itself does 
> not get a new sequenceid of its own, or advance the mvcc number. 
>  - BL completes WITHOUT making sure that the sequence id is visible. 
>  - BL itself though writes entries to the WAL for the BL event, which in 1.2 
> code bases goes through the whole mvcc + seqId paths, which makes sure that 
> earlier sequenceIds (the flush sequenceId) are visible via mvcc. 
> The problem with 1.1 is that the WAL entries only get sequence ids, but do 
> not touch mvcc. With the patch for HBASE-16721, we have made it so that the 
> flushedSequenceId is not used in mvcc as the highest read point (although all 
> the data is still visible).
> BL relying on the flush sequence id is wrong for two reasons: 
>  - BL files are loaded with the flush sequence id from the memstore. This 
> particular sequence id is used twice for two different things and ends up 
> being the sequence id for flushed file as well as BL'ed files. 
>  - BL should make sure that it gets a new sequence id and that sequence id is 
> visible before returning the results. 
> [~ndimiduk] FYI. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

2016-11-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675968#comment-15675968
 ] 

Hadoop QA commented on HBASE-16820:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
2s {color} | {color:green} branch-1.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} branch-1.1 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} branch-1.1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} branch-1.1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} branch-1.1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 48s 
{color} | {color:red} hbase-server in branch-1.1 has 80 extant Findbugs 
warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 33s 
{color} | {color:red} hbase-server in branch-1.1 failed with JDK v1.8.0_111. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} branch-1.1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 52s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_111. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 37s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
31s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 120m 33s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestSplitWalDataLoss |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8012383 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12833147/HBASE-16820-branch-1.1-v0.patch
 |
| JIRA Issue | HBASE-16820 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux a2c91f751adf 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

2016-11-17 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675750#comment-15675750
 ] 

Nick Dimiduk commented on HBASE-16820:
--

Marking as blocker for 1.1.8.

> BulkLoad mvcc visibility only works accidentally 
> -
>
> Key: HBASE-16820
> URL: https://issues.apache.org/jira/browse/HBASE-16820
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.8
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Blocker
> Fix For: 1.1.8
>
> Attachments: HBASE-16820-branch-1.1-v0.patch
>
>
> [~sergey.soldatov] has been debugging an issue with a 1.1 code base where the 
> commit for HBASE-16721 broke the bulk load visibility. After bulk load, the 
> bulk load files is not visible because the sequence id assigned to the bulk 
> load is not advanced in mvcc. 
> Debugging further, we have noticed that bulk load behavior is wrong, but it 
> works "accidentally" in all code bases (but broken in 1.1 after HBASE-16721). 
> Let me explain: 
>  - BL request can optionally request a flush before hand (this should be the 
> default) which causes the flush to happen with some sequenceId. The flush 
> sequence id is one past all the cells' sequenceids. This flush sequence id is 
> returned as a result to the flush operation. 
>  - BL then uses this particular sequenceId to mark the files, but itself does 
> not get a new sequenceid of its own, or advance the mvcc number. 
>  - BL completes WITHOUT making sure that the sequence id is visible. 
>  - BL itself though writes entries to the WAL for the BL event, which in 1.2 
> code bases goes through the whole mvcc + seqId paths, which makes sure that 
> earlier sequenceIds (the flush sequenceId) are visible via mvcc. 
> The problem with 1.1 is that the WAL entries only get sequence ids, but do 
> not touch mvcc. With the patch for HBASE-16721, we have made it so that the 
> flushedSequenceId is not used in mvcc as the highest read point (although all 
> the data is still visible).
> BL relying on the flush sequence id is wrong for two reasons: 
>  - BL files are loaded with the flush sequence id from the memstore. This 
> particular sequence id is used twice for two different things and ends up 
> being the sequence id for flushed file as well as BL'ed files. 
>  - BL should make sure that it gets a new sequence id and that sequence id is 
> visible before returning the results. 
> [~ndimiduk] FYI. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

2016-11-17 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675743#comment-15675743
 ] 

Nick Dimiduk commented on HBASE-16820:
--

Thanks for the explanation. Where are you guys with this one? We can at least 
see what QA says about the patch. Do we have a mixed online + bulkload ITBLL 
variant we can use to test it?

> BulkLoad mvcc visibility only works accidentally 
> -
>
> Key: HBASE-16820
> URL: https://issues.apache.org/jira/browse/HBASE-16820
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-16820-branch-1.1-v0.patch
>
>
> [~sergey.soldatov] has been debugging an issue with a 1.1 code base where the 
> commit for HBASE-16721 broke the bulk load visibility. After bulk load, the 
> bulk load files is not visible because the sequence id assigned to the bulk 
> load is not advanced in mvcc. 
> Debugging further, we have noticed that bulk load behavior is wrong, but it 
> works "accidentally" in all code bases (but broken in 1.1 after HBASE-16721). 
> Let me explain: 
>  - BL request can optionally request a flush before hand (this should be the 
> default) which causes the flush to happen with some sequenceId. The flush 
> sequence id is one past all the cells' sequenceids. This flush sequence id is 
> returned as a result to the flush operation. 
>  - BL then uses this particular sequenceId to mark the files, but itself does 
> not get a new sequenceid of its own, or advance the mvcc number. 
>  - BL completes WITHOUT making sure that the sequence id is visible. 
>  - BL itself though writes entries to the WAL for the BL event, which in 1.2 
> code bases goes through the whole mvcc + seqId paths, which makes sure that 
> earlier sequenceIds (the flush sequenceId) are visible via mvcc. 
> The problem with 1.1 is that the WAL entries only get sequence ids, but do 
> not touch mvcc. With the patch for HBASE-16721, we have made it so that the 
> flushedSequenceId is not used in mvcc as the highest read point (although all 
> the data is still visible).
> BL relying on the flush sequence id is wrong for two reasons: 
>  - BL files are loaded with the flush sequence id from the memstore. This 
> particular sequence id is used twice for two different things and ends up 
> being the sequence id for flushed file as well as BL'ed files. 
>  - BL should make sure that it gets a new sequence id and that sequence id is 
> visible before returning the results. 
> [~ndimiduk] FYI. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

2016-10-13 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572878#comment-15572878
 ] 

Enis Soztutar commented on HBASE-16820:
---

bq. I think It is not wrong to use the flushseqid as the edits in bulk loaded 
file's seqid. 
See the reasoning in the description. We should not use the same sequence id 
for two different purposes. 

bq. if I remember from the original implementation, we used the flush 
sequenceid because that was guaranteed 'safe'; going beyond this id there were 
concerns some other edit could sneak in between the flush and bulk load. 
Checked the code. It seems that we are not acquiring the updatesLock for BL 
which means that we can end up with the case that edits can come in between the 
flush seq id and BL sequence id. We can make it so that we get the updatesLock 
first before the flush, then after flush, we get the BL sequence id. Since the 
updates lock is reentrant, it should work as long as the same thread is doing 
the flush.  


> BulkLoad mvcc visibility only works accidentally 
> -
>
> Key: HBASE-16820
> URL: https://issues.apache.org/jira/browse/HBASE-16820
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-16820-branch-1.1-v0.patch
>
>
> [~sergey.soldatov] has been debugging an issue with a 1.1 code base where the 
> commit for HBASE-16721 broke the bulk load visibility. After bulk load, the 
> bulk load files is not visible because the sequence id assigned to the bulk 
> load is not advanced in mvcc. 
> Debugging further, we have noticed that bulk load behavior is wrong, but it 
> works "accidentally" in all code bases (but broken in 1.1 after HBASE-16721). 
> Let me explain: 
>  - BL request can optionally request a flush before hand (this should be the 
> default) which causes the flush to happen with some sequenceId. The flush 
> sequence id is one past all the cells' sequenceids. This flush sequence id is 
> returned as a result to the flush operation. 
>  - BL then uses this particular sequenceId to mark the files, but itself does 
> not get a new sequenceid of its own, or advance the mvcc number. 
>  - BL completes WITHOUT making sure that the sequence id is visible. 
>  - BL itself though writes entries to the WAL for the BL event, which in 1.2 
> code bases goes through the whole mvcc + seqId paths, which makes sure that 
> earlier sequenceIds (the flush sequenceId) are visible via mvcc. 
> The problem with 1.1 is that the WAL entries only get sequence ids, but do 
> not touch mvcc. With the patch for HBASE-16721, we have made it so that the 
> flushedSequenceId is not used in mvcc as the highest read point (although all 
> the data is still visible).
> BL relying on the flush sequence id is wrong for two reasons: 
>  - BL files are loaded with the flush sequence id from the memstore. This 
> particular sequence id is used twice for two different things and ends up 
> being the sequence id for flushed file as well as BL'ed files. 
>  - BL should make sure that it gets a new sequence id and that sequence id is 
> visible before returning the results. 
> [~ndimiduk] FYI. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

2016-10-13 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572549#comment-15572549
 ] 

stack commented on HBASE-16820:
---

Thanks for the reasoning. Interesting from your recounting is the fact that 
flushing first is optional (I'd thought this non-optional) and the bit where 
when we unified sequenceid and mvcc, we missed this special-case. The patch 
looks like it will 'work' but I like the suggestion that bulk load get its own 
sequenceid that is apart from that of the flush; if I remember from the 
original implementation, we used the flush sequenceid because that was 
guaranteed 'safe'; going beyond this id there were concerns some other edit 
could sneak in between the flush and bulk load. I think now though we have 
mechanism to get a sequenceid with guaranteeds that it immediately follows the 
flush id with no edits able to sneak in behind. Nice work lads.

> BulkLoad mvcc visibility only works accidentally 
> -
>
> Key: HBASE-16820
> URL: https://issues.apache.org/jira/browse/HBASE-16820
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-16820-branch-1.1-v0.patch
>
>
> [~sergey.soldatov] has been debugging an issue with a 1.1 code base where the 
> commit for HBASE-16721 broke the bulk load visibility. After bulk load, the 
> bulk load files is not visible because the sequence id assigned to the bulk 
> load is not advanced in mvcc. 
> Debugging further, we have noticed that bulk load behavior is wrong, but it 
> works "accidentally" in all code bases (but broken in 1.1 after HBASE-16721). 
> Let me explain: 
>  - BL request can optionally request a flush before hand (this should be the 
> default) which causes the flush to happen with some sequenceId. The flush 
> sequence id is one past all the cells' sequenceids. This flush sequence id is 
> returned as a result to the flush operation. 
>  - BL then uses this particular sequenceId to mark the files, but itself does 
> not get a new sequenceid of its own, or advance the mvcc number. 
>  - BL completes WITHOUT making sure that the sequence id is visible. 
>  - BL itself though writes entries to the WAL for the BL event, which in 1.2 
> code bases goes through the whole mvcc + seqId paths, which makes sure that 
> earlier sequenceIds (the flush sequenceId) are visible via mvcc. 
> The problem with 1.1 is that the WAL entries only get sequence ids, but do 
> not touch mvcc. With the patch for HBASE-16721, we have made it so that the 
> flushedSequenceId is not used in mvcc as the highest read point (although all 
> the data is still visible).
> BL relying on the flush sequence id is wrong for two reasons: 
>  - BL files are loaded with the flush sequence id from the memstore. This 
> particular sequence id is used twice for two different things and ends up 
> being the sequence id for flushed file as well as BL'ed files. 
>  - BL should make sure that it gets a new sequence id and that sequence id is 
> visible before returning the results. 
> [~ndimiduk] FYI. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

[jira] [Commented] (HBASE-16820) BulkLoad mvcc visibility only works accidentally

8 matches

Site Navigation

Mail list logo

Footer information