[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-09-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117302#comment-14117302
 ] 

Hudson commented on HBASE-11591:


FAILURE: Integrated in HBase-1.0 #142 (See 
[https://builds.apache.org/job/HBase-1.0/142/])
HBASE-11591 Scanner fails to retrieve KV from bulk loaded file with 
(ramkrishna: rev 844f3dfb6a9b2267b7e06ee2a176c76ae89ff7bf)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestScannerWithBulkload.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java


 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0, 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, HBASE-11591_6.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-09-01 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117341#comment-14117341
 ] 

Anoop Sam John commented on HBASE-11591:


+1 for addendum for branch-1.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0, 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, HBASE-11591_6.patch, 
 HBASE-11591_branch-1-addendum.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-29 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116217#comment-14116217
 ] 

Anoop Sam John commented on HBASE-11591:


[~ram_krish] The affected version and fix version given as 0.99 but pushed only 
to master?

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, HBASE-11591_6.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110344#comment-14110344
 ] 

Hadoop QA commented on HBASE-11591:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664316/HBASE-11591_6.patch
  against trunk revision .
  ATTACHMENT ID: 12664316

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 8 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10578//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10578//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10578//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10578//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10578//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10578//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10578//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10578//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10578//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10578//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10578//console

This message is automatically generated.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, TestBulkload.java, hbase-11591-03-02.patch, 
 hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both 

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-26 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110403#comment-14110403
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


The javadoc warning is not from this patch.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, TestBulkload.java, hbase-11591-03-02.patch, 
 hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110573#comment-14110573
 ] 

Hadoop QA commented on HBASE-11591:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664353/HBASE-11591_6.patch
  against trunk revision .
  ATTACHMENT ID: 12664353

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 8 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 2 zombie test(s):   
at org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:250)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10585//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10585//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10585//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10585//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10585//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10585//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10585//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10585//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10585//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10585//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10585//console

This message is automatically generated.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, HBASE-11591_6.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, 

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110744#comment-14110744
 ] 

Hudson commented on HBASE-11591:


FAILURE: Integrated in HBase-TRUNK #5431 (See 
[https://builds.apache.org/job/HBase-TRUNK/5431/])
HBASE-11591 Scanner fails to retrieve KV from bulk loaded file with 
(ramkrishna: rev dea6480023e78a3facdaf1cfc00ad6cc35ecb3ea)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestScannerWithBulkload.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, HBASE-11591_6.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109021#comment-14109021
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


[~jeffreyz]
Is the latest patch good for commit?
[~anoop.hbase], [~saint@gmail.com]
What you guys think?

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 TestBulkload.java, hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-25 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110032#comment-14110032
 ] 

Jeffrey Zhong commented on HBASE-11591:
---

The patch looks good to me(+1) with one minor comment:
{noformat}
+  w = mvcc.beginMemstoreInsert();
+  long flushSeqId = getNextSequenceId(wal);
+  FlushResult flushResult = new FlushResult(
+  FlushResult.Result.CANNOT_FLUSH_MEMSTORE_EMPTY, flushSeqId, 
Nothing to flush);
+  w.setWriteNumber(flushSeqId);
+  mvcc.waitForPreviousTransactionsComplete(w);
+  return flushResult
{noformat}
You can set w=null after mvcc.waitForPreviousTransactionsComplete(w); so 
mvcc.advanceMemstore in finally block can be skipped.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 TestBulkload.java, hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110275#comment-14110275
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


bq.You can set w=null after mvcc.waitForPreviousTransactionsComplete(w); so 
mvcc.advanceMemstore in finally block can be skipped.
Thanks for the review.  Will update and commit the patch.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 TestBulkload.java, hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105185#comment-14105185
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


[~jeffreyz]
First of all thanks a lot for taking a look at this issue and providing a patch.
I debugged this issue with 2 cases - with my patch and with Jeffrey's patch. 
Observed the following things
- The testcase that was added as part of this testcase is for same KVs in the 
Store file and the bulk loaded Store file and it was specific for that issue.
- After Jeffrey's first patch few testcases failed and those were not having 
this case of same KVs. All were different row keys. Things were working fine 
because it was purely based on row key comparison and no mvcc would have even 
come into it. (I mean even before the patch). I think that exposed some of the 
bug that was inside.

- Another important observation is that when we are scanning the KVs in the 
bulk loaded file (atleast those created new LoadIncrementalHFile cases) there 
is no mvcc info added to the metadata also.  So 
{code}
return new StoreFileScanner(this,
 getScanner(cacheBlocks, pread, isCompaction),
 !isCompaction, reader.hasMVCCInfo(), readPt);
{code}
will say has mvccInfo as false and hence skipNewerThanReadPoint() would never 
be called because 
{code}
if (hasMVCCInfo)
  skipKVsNewerThanReadpoint();
{code}
So before the patch too, the scenario in the failed test case 
TestWALReplay.testCompactedBulkLoadedFiles() though our seqID for the bulk 
loaded files were 5, and the read point for all the scanners created in the 
test case was 4 - we were trying to read the bulk loaded file also.  But we 
were not able to skip the kvs in the bulk loaded file just because hasMvccInfo 
was false.  So the tests were passing.
Ok so what happens after Jeffrey's patch(the first patch without HREgion's 
change) is that on seeing any bulk loaded file we just assign the file's seqid 
to the KV's seqId.  And so after compaction still the read pt is not modified 
to the latest (ie 5) and hence all the KVs that were written to the compacted 
file from the bulk loaded files were missing.
I think the change in HRegion.java to set the write Sequence number is a bug 
fix? I still feel the patch would cause issue in the following scenario after 
the above changes

- Assume a scan started and the read point is 20 at that time
- Bulk load is just getting completed and the scanner heap gets reset.  The 
new bulk loaded file with seqId 22 (for eg) gets added now to the scanner heap. 
But remember that the read point is still 20.
- After this change we would just set the bulk load file's seqId to all its 
KVs  which is 22.  
- Because there is no mvcc info in this bulk loaded file the scan would not be 
able to skipTheKvsWithNewerReadPt() and hence the scan would still see the Kvs 
with 22 as the seqId though the intention is to see only KVs with seqID 20.
I may be wrong. Am I missing something here? I may be wrong because for bulk 
loaded files because there is no mvcc we are allowed to read anything in that 
irrespective of the read pt?

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105286#comment-14105286
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


Another thing is that while doing flush as part of Bulk load if there is 
nothing to be flushed should we still update the mvcc.
{code}
if (this.memstoreSize.get() = 0) {
  // Presume that if there are still no edits in the memstore, then 
there are no edits for
  // this region out in the WAL/HLog subsystem so no need to do any 
trickery clearing out
  // edits in the WAL system. Up the sequence number so the resulting 
flush id is for
  // sure just beyond the last appended region edit (useful as a marker 
when bulk loading,
  // etc.)
  // wal can be null replaying edits.
  return wal != null?
new FlushResult(FlushResult.Result.CANNOT_FLUSH_MEMSTORE_EMPTY,
  getNextSequenceId(wal), Nothing to flush):
new FlushResult(FlushResult.Result.CANNOT_FLUSH_MEMSTORE_EMPTY, 
Nothing to flush);
}
{code}


 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105428#comment-14105428
 ] 

Ted Yu commented on HBASE-11591:


Minor:
{code}
-return !hasMVCCInfo ? true : skipKVsNewerThanReadpoint();
+if (!hasMVCCInfo  this.reader.isBulkLoaded()) {
+  return skipKVsNewerThanReadpoint();
+} else {
+  return !hasMVCCInfo ? true : skipKVsNewerThanReadpoint();
{code}
The if condition above would be more readable if written this way:
{code}
+if (hasMVCCInfo || this.reader.isBulkLoaded()) {
{code}

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 TestBulkload.java, hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-21 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105659#comment-14105659
 ] 

Jeffrey Zhong commented on HBASE-11591:
---

[~ram_krish] {quote}
I think the change in HRegion.java to set the write Sequence number is a bug 
fix? I still feel the patch would cause issue in the following scenario after 
the above changes
- Assume a scan started and the read point is 20 at that time
- Bulk load is just getting completed and the scanner heap gets reset. The new 
bulk loaded file with seqId 22 (for eg) gets added now to the scanner heap. But 
remember that the read point is still 20.
- After this change we would just set the bulk load file's seqId to all its 
KVs which is 22. 
- Because there is no mvcc info in this bulk loaded file the scan would not be 
able to skipTheKvsWithNewerReadPt() and hence the scan would still see the Kvs 
with 22 as the seqId though the intention is to see only KVs with seqID 20.
I may be wrong. Am I missing something here? I may be wrong because for bulk 
loaded files because there is no mvcc we are allowed to read anything in that 
irrespective of the read pt?
{quote}
The situation above is valid. While existing behavior(like 0.98), we allow a 
scan with lower readpt to read a bulk loaded file immediately as we can load a 
hfile atomically. I think it's fine either keeping existing behavior or add 
handling for such cases. Another option to handle such case you can set 
hasMVCCInfo to true for a bulk loaded file because we will set its KVs' mvcc 
using Hfile seqId.


[~jerryhe] {quote}It can be used backup HBase data and restore.{quote}
For such case, trunk code can handle it but you need to keep deleted cells  
mvcc forever(using config hbase.hstore.compaction.keep.mvcc.period). When you 
load a old hfile, its KVs will be sorted correctly based on their mvcc 
values(LogSeqId).
  


 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 TestBulkload.java, hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases 

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106424#comment-14106424
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


[~jeffreyz]
bq.I think it's fine either keeping existing behavior or add handling for such 
cases
Then I think the above change is not necessary and only handle the case as per 
the initial patch where we handle same KVs case.  The later patches attached is 
some how trying to make the mvcc accept the bulk load files and make it visible 
just because we are setting the bulk load file's seq id to the KV from the bulk 
loaded file.
I think if we have to maintain the behaviour then only handle the case of same 
Kvs should be fine. If not the other changes are necessary. This is my take on 
this.  Pls feel free to correct me.
But thinking of cases like [~jerryhe] then this change is right and for that we 
need to handle all cases.  
What do other guys think?

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 TestBulkload.java, hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-20 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103539#comment-14103539
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


So instead of setting it in the comparator you would set it when the kv is 
retrieved. 
Should we really do this here
{code}
setCurrentCell(KeyValueUtil.createLastOnRowCol(kv));
{code}
and 
{code}
setCurrentCell(KeyValueUtil.createFirstOnRowColTS(kv, maxTimestampInFile));
{code}
This is a fake key that we are creating right?

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-20 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104134#comment-14104134
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


[~saint@gmail.com]
bq.A marker Interface that allows you set sequence id on the hosting object 
seems fine. MutableCell is a little ugly since it tarnishes our nice 'Cell' 
notion.
Did not see the comment.  Pls see HBASE-11777.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-20 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104311#comment-14104311
 ] 

Jeffrey Zhong commented on HBASE-11591:
---

[~jerryhe]  {quote}If it is bulkloaded file, could we just set the cells 
regardless of its old seqId in the cell?{quote}
Yes, we could. The condition is to prevent a Cell from keeping reset

[~ram_krish] {quote}This is a fake key that we are creating right?{quote} 
Yes, you're right that we don't have to use setCurrentCell in these two cases. 
The patch is to use a consistent way to set instance variable cur so that it's 
easy to maintainreasoning in the future or we do more in the setCurrentCell 
call. I guess there is no much difference either way.


 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-20 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104542#comment-14104542
 ] 

Jerry He commented on HBASE-11591:
--

Hi, [~jeffreyz]

Regarding the  cur.getSequenceId() = 0 condition again, it is possible 
that the cells in the original bulk load hfiles have seqId  0.
In this case, we also need to reset them to the new file level seqId.  Right?

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-20 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104559#comment-14104559
 ] 

Jeffrey Zhong commented on HBASE-11591:
---

{quote}
 it is possible that the cells in the original bulk load hfiles have seqId  0.
{quote}
For bulk loaded files, this should not happen.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-20 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104575#comment-14104575
 ] 

Jerry He commented on HBASE-11591:
--

Hi, [~jeffreyz]

There are such use cases.  Please see HBASE-11772.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-20 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104703#comment-14104703
 ] 

Jeffrey Zhong commented on HBASE-11591:
---

I c. For HBASE-11772 situation, it's possible. Not sure what's use scenario for 
loading a native hfile directly and also need more to make that work though. We 
can take the condition cur.getSequenceId() = 0 out here or we can take it 
out in patch of HBASE-11772.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-20 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104774#comment-14104774
 ] 

Jerry He commented on HBASE-11591:
--

It can be used backup HBase data and restore.
Either way is fine.  I will pick up the work in HBASE-11772 on top of whatever 
is done here. Thanks!

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104973#comment-14104973
 ] 

Hadoop QA commented on HBASE-11591:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12663192/hbase-11591-03-02.patch
  against trunk revision .
  ATTACHMENT ID: 12663192

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 7 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10513//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10513//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10513//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10513//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10513//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10513//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10513//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10513//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10513//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10513//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10513//console

This message is automatically generated.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
 

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102535#comment-14102535
 ] 

stack commented on HBASE-11591:
---

There is the SequenceNumber Interface but that is only about getting a 
SequenceNumber.

As per you fellows, don't think we need add method to Cell.  There are no 
setters in Cell currently.  Why start now.

A marker Interface that allows you set sequence id on the hosting object seems 
fine.  MutableCell is a little ugly since it tarnishes our nice 'Cell' notion.

What about adding setter on SequenceNumber? One of the implementors is HLogKey. 
 It has a:

  void setLogSeqNum(final long sequence) {
this.logSeqNum = sequence;
this.seqNumAssignedLatch.countDown();
  }



 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103329#comment-14103329
 ] 

Hadoop QA commented on HBASE-11591:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12662879/hbase-11591-03-jeff.patch
  against trunk revision .
  ATTACHMENT ID: 12662879

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 7 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.wal.TestSecureWALReplay
  org.apache.hadoop.hbase.regionserver.wal.TestWALReplay
  
org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.mapreduce.TestRowCounter.testRowCounterHiddenColumn(TestRowCounter.java:137)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10491//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10491//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10491//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10491//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10491//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10491//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10491//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10491//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10491//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10491//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10491//console

This message is automatically generated.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But 

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-19 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103405#comment-14103405
 ] 

Jerry He commented on HBASE-11591:
--

Looking at this JIRA and the patches, I think it would benefit HBase-11772 as 
well, particularly the idea to set mvcc (seqld) in the cell with the seqId of 
the bulkloaded file.
Looks good!

[~jeffreyz]:
{code}
+  protected void setCurrentCell(Cell newVal) {
+this.cur = newVal;
+if(this.cur != null  this.reader.isBulkLoaded()  cur.getSequenceId() 
= 0) {
+  KeyValue curKV = KeyValueUtil.ensureKeyValue(cur);
+  curKV.setSequenceId(this.reader.getSequenceID());
+  cur = curKV;
+}
+  }
{code}
You have  cur.getSequenceId() = 0 in the if condition.  If it is 
bulkloaded file, could we just set the cells regardless of its old seqId in the 
cell?


 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100310#comment-14100310
 ] 

Anoop Sam John commented on HBASE-11591:


Sure. Some quick comments after a glance at the patch

isBulkLoadResult -  isBulkLoaded()?   For setter also?
I see this isBulkLoadResult () in StoreFile.java level also. I would have been 
better to know this status from StoreFile rather than from StoreFileReader.

Also what abt compacting a flush file and a bulk loaded one?  Will we have 
issues then? This patch will handle that also?  Mind adding tests around that 
also.

compareWithoutMvcc(Cell left, Cell right)
Now we have deprecated *mvcc () methods. Suggest change in name here also.

bq.// TODO : While doing cells this is should be avoided in the read path.
IMHO we should not do this KeyValueUtil.ensureKeyValue() stuff from now. (In 
read path mainly) In near future we  will want Cells in read path. How we can 
solve this particular issue then? (We can not add setter in Cell.java I 
believe)  Or do we need an extension interface for Cell *in server side* which 
is having the setter?

Doing a deeper look Ram. Sorry for being late.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100361#comment-14100361
 ] 

Anoop Sam John commented on HBASE-11591:


{code}
+  if(bulkLoad) {
+// TODO : While doing cells this is should be avoided in the read 
path.
+KeyValue leftKV = KeyValueUtil.ensureKeyValue(left.peek());
+KeyValue rightKV = KeyValueUtil.ensureKeyValue(right.peek());
+if(leftKV.getSequenceId() == 0) {
+  leftKV.setSequenceId(rightKV.getSequenceId());
+} else {
+  rightKV.setSequenceId(leftKV.getSequenceId());
+}
+  }
{code}

So what do we do here Ram? 
I think we need to set KV seqId for KVs, from bulk loaded file, to the file 
seqId (which we get from that file name).  So instead of this set seqId of one 
KV to other (which looks hacky IMO)  can we do the set by the seqId of the file?

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100372#comment-14100372
 ] 

Hadoop QA commented on HBASE-11591:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662425/HBASE-11591_2.patch
  against trunk revision .
  ATTACHMENT ID: 12662425

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10473//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10473//console

This message is automatically generated.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = 

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100384#comment-14100384
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


I got a clean QA run.
bq.isBulkLoadResult - isBulkLoaded()? For setter also?
Okie. Fine with that.
bq.I see this isBulkLoadResult () in StoreFile.java level also. I would have 
been better to know this status from StoreFile rather than from StoreFileReader.
I spent some time for doing it.  Later decided this way.First thing is that 
only the reader is passed to the StoreFileScanner and storefilescanner only has 
a reader associated with it.  So if we need to have this informaiton from 
Storefile then i need to change the constructor of StoreFileScanner or use a 
setter.  I thought that was making the patch heavier.  Also in this case the 
information of bulk load or not has to be passed from the reader (because the 
reader reads the file info) and then set that on the Storefile.  Currently 
reader is also an inner class of StoreFile.  Considering all this i just kept 
the new getter/setter in the Reader level. 
bq.compareWithoutMvcc
Okie.  
bq.IMHO we should not do this KeyValueUtil.ensureKeyValue() stuff from now
Yes.. But i think that we should do in a separete JIRA infact to avoid this 
setSeqId but doing KeyValueUtil.ensureKeyValue().
bq.I think we need to set KV seqId for KVs, from bulk loaded file, to the file 
seqId
Yes.. I did set the other KV's sequence id because I wanted to ensure that we 
return one of the KVs from the two of them that are contesting here and ensure 
that we return a KV like what would have been returned if there was no clash 
and the lastest one was from the flushed file.  
Anyway before changing this let me check some more cases.  Then would update 
the patch accordingly.  Infact I had set the sequenceId of the file and later 
changed it to this way.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are 

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100392#comment-14100392
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


bq.Also what abt compacting a flush file and a bulk loaded one? Will we have 
issues then? This patch will handle that also? Mind adding tests around that 
also.
The current test is also compacting the flushed files. Behaviour wise both 
would be same in 0.99+.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100539#comment-14100539
 ] 

Hadoop QA commented on HBASE-11591:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662463/HBASE-11591_3.patch
  against trunk revision .
  ATTACHMENT ID: 12662463

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestRegionRebalancing

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10477//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10477//console

This message is automatically generated.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie 

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100860#comment-14100860
 ] 

Ted Yu commented on HBASE-11591:


{code}
+ * Compares two cells without mvcc
+ *
+ * @param left
+ * @param right
+ * @return less than 0 if left is smaller, 0 if equal etc..
+ */
+public int compareWithoutSeqId(Cell left, Cell right) {
{code}
Change javadoc to match the method name.

Cell is marked @InterfaceStability.Evolving
setSequenceId() should be added to Cell interface - in another issue.
{code}
+public class TestScannerWithBulkload {
+  private final static HBaseTestingUtility TEST_UTIL = new 
HBaseTestingUtility();
+  private final static String tableName = testBulkload;
{code}
Please change tableName to match test name.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100875#comment-14100875
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


bq.setSequenceId() should be added to Cell interface - in another issue.
I don't think we can add setSequenceId() in Cell.  We can discuss on that. Will 
update the patch.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101080#comment-14101080
 ] 

Andrew Purtell commented on HBASE-11591:


bq. But setting the seqId on the read path would prevent us from using Cell 
based impl because Cell does not have it.

What prevents us from adding seqID accessors as an additional interface 
extending Cell in hbase-server as Anoop proposed above?

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101818#comment-14101818
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


bq.Or do we need an extension interface for Cell in server side which is having 
the setter?
I think better we do only that.  Is there any better way?  The idea of cell to 
create different impl of it as per the case needed like how in 
BufferedDataEncoders the SeekerState is a Cell now. 
Infact everywhere the setSeqId() that we do in the server side should be 
changed.  Do it in this JIRA or another JIRA?  One thing to note that in the 
critical path we would any way have code that would create instances of that 
new impl.  

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101837#comment-14101837
 ] 

Anoop Sam John commented on HBASE-11591:


+1 for a new Jira.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101843#comment-14101843
 ] 

Anoop Sam John commented on HBASE-11591:


HBASE-11772 related issue.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101875#comment-14101875
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


bq.+1 for a new Jira
Ok will raise JIRA.
bq.HBASE-11772 related issue.
Better to see if this patch is useful in terms of HBASE-11772 also.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101880#comment-14101880
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


Raised https://issues.apache.org/jira/browse/HBASE-11777.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-17 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100274#comment-14100274
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


All the test case issues has this 
{code}
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:713)
at 
org.mortbay.thread.QueuedThreadPool.newThread(QueuedThreadPool.java:462)
at 
org.mortbay.thread.QueuedThreadPool.doStart(QueuedThreadPool.java:403)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.Server.doStart(Server.java:218)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.apache.hadoop.hbase.http.HttpServer.start(HttpServer.java:949)
at org.apache.hadoop.hbase.http.InfoServer.start(InfoServer.java:78)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.putUpWebUI(HRegionServer.java:1602)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:520)
at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.init(MiniHBaseCluster.java:115)
{code}

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-17 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100286#comment-14100286
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


All the failed tests are passing.  Let me rerun once again for HadoopQA. 
[~jeffreyz]
Uses kv.setSequenceId() here.  Pls have a look.  But setting the seqId on the 
read path would prevent us from using Cell based impl because Cell does not 
have it. For now it is fine.
[~saint@gmail.com],[~apurtell],[~anoop.hbase]
Want to have a look at this?


 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099642#comment-14099642
 ] 

Hadoop QA commented on HBASE-11591:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662280/HBASE-11591_1.patch
  against trunk revision .
  ATTACHMENT ID: 12662280

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestBigDecimalColumnInterpreter
  org.apache.hadoop.hbase.coprocessor.TestMasterObserver
  org.apache.hadoop.hbase.mapred.TestTableSnapshotInputFormat
  
org.apache.hadoop.hbase.coprocessor.TestDoubleColumnInterpreter
  
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithRemove
  
org.apache.hadoop.hbase.replication.regionserver.TestReplicationSink
  
org.apache.hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithRemove
  
org.apache.hadoop.hbase.io.encoding.TestLoadAndSwitchEncodeOnDisk
  org.apache.hadoop.hbase.mapred.TestTableInputFormat
  org.apache.hadoop.hbase.io.hfile.TestHFileBlock
  org.apache.hadoop.hbase.coprocessor.TestHTableWrapper
  org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint
  org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint
  org.apache.hadoop.hbase.coprocessor.TestRegionServerObserver
  org.apache.hadoop.hbase.coprocessor.TestClassLoading
  org.apache.hadoop.hbase.coprocessor.TestOpenTableInCoprocessor
  
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
  
org.apache.hadoop.hbase.coprocessor.TestBatchCoprocessorEndpoint
  org.apache.hadoop.hbase.TestGlobalMemStoreSize
  org.apache.hadoop.hbase.TestRegionRebalancing
  org.apache.hadoop.hbase.TestIOFencing
  org.apache.hadoop.hbase.zookeeper.TestZooKeeperACL
  org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass
  org.apache.hadoop.hbase.coprocessor.TestAggregateProtocol

 {color:red}-1 core zombie tests{color}.  There are 4 zombie test(s):   
at 
org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingTTL.testScannerSelection(TestScannerSelectionUsingTTL.java:128)
at 
org.apache.hadoop.hbase.io.encoding.TestEncodedSeekers.testEncodedSeeker(TestEncodedSeekers.java:117)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10468//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10468//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10468//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10468//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10468//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10468//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10468//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10468//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10468//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-15 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099346#comment-14099346
 ] 

Jeffrey Zhong commented on HBASE-11591:
---

{quote}
Should we rewrite the KV before sending it to the StoreScanner layer so that 
the kv comparison works fine?
{quote}
We can set mvcc(now the function name is setSequenceId()) when reading KVs from 
bulk loaded hfiles using hfile's sequence Id. This way is cleaner and can solve 
the issue that a new hfile may contain some KVs without mvcc when compacting a 
normal hfile with a bulk loaded hfile.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-14 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098186#comment-14098186
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


Yes logically right.  But the thing is in this case it is retrieving a kv which 
is smaller than the previous KV just because of the mvcc of the bulk loaded 
file.  Should we rewrite the KV before sending it to the StoreScanner layer so 
that the kv comparison works fine? 

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095928#comment-14095928
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


Any thoughts here?

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-13 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096324#comment-14096324
 ] 

Jeffrey Zhong commented on HBASE-11591:
---

The patch looks good to me(+1)! Basically it restores old behavior for bulk 
loaded files by treating all KVs involved with mvcc==0.  Thanks.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-31 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080548#comment-14080548
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


Other test case seems to fail on some env issues.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-31 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080547#comment-14080547
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


Not sure on other test cases failures but the new test case added 
TestScannerWithBulkLoad fails here
{code}
  protected void checkScanOrder(Cell prevKV, Cell kv,
  KeyValue.KVComparator comparator) throws IOException {
// Check that the heap gives us KVs in an increasing order.
assert prevKV == null || comparator == null
|| comparator.compare(prevKV, kv) = 0 : Key  + prevKV
+  followed by a  + smaller key  + kv +  in cf  + store;
  }
{code}
So can we remove that assertion?  This change is becoming trickier.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-31 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080549#comment-14080549
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


{code}
Error: java.lang.NullPointerException
at 
org.apache.hadoop.hbase.mapreduce.LabelExpander.getLabelOrdinals(LabelExpander.java:129)
at 
org.apache.hadoop.hbase.mapreduce.LabelExpander.getLabelOrdinals(LabelExpander.java:145)
at 
org.apache.hadoop.hbase.mapreduce.LabelExpander.createVisibilityTags(LabelExpander.java:105)
at 
org.apache.hadoop.hbase.mapreduce.LabelExpander.createKVFromCellVisibilityExpr(LabelExpander.java:217)
at 
org.apache.hadoop.hbase.mapreduce.TsvImporterMapper.createPuts(TsvImporterMapper.java:195)
at 
org.apache.hadoop.hbase.mapreduce.TsvImporterMapper.map(TsvImporterMapper.java:153)
at 
org.apache.hadoop.hbase.mapreduce.TsvImporterMapper.map(TsvImporterMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
{code}

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080367#comment-14080367
 ] 

Hadoop QA commented on HBASE-11591:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658788/HBASE-11591.patch
  against trunk revision .
  ATTACHMENT ID: 12658788

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction
  org.apache.hadoop.hbase.migration.TestNamespaceUpgrade
  org.apache.hadoop.hbase.regionserver.TestScannerWithBulkload
  
org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas
  org.apache.hadoop.hbase.regionserver.TestRegionReplicas
  
org.apache.hadoop.hbase.mapreduce.TestImportTSVWithVisibilityLabels
  org.apache.hadoop.hbase.client.TestReplicasClient
  org.apache.hadoop.hbase.master.TestRestartCluster
  org.apache.hadoop.hbase.TestIOFencing

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//console

This message is automatically generated.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted 

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-28 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076936#comment-14076936
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


[~saint@gmail.com],[~stack],[~jeffreyz]
Want to take a look at this?  Should the KVScannerComparator take into 
consideration before comparing the cell, whether the cell comes from a bulk 
loaded file?

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-28 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077290#comment-14077290
 ] 

Jeffrey Zhong commented on HBASE-11591:
---

I think this can be solved when reading bulk load hfiles, we can use current 
hfile sequenceId as its KVs' mvcc values.  The other option is to add a new 
metadata into hfile as default mvcc value for situations like bulk loaded 
hfiles. Thanks.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-28 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077389#comment-14077389
 ] 

ramkrishna.s.vasudevan commented on HBASE-11591:


bq.we can use current hfile sequenceId as its KVs' mvcc values.
Ya this is what I too felt. But to do that we  may need to add some methods and 
ensure that the KeyValueHeap is of type StoreFileScanner and the reader on that 
is of bulk loaded file.
 Will post a patch soon.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075423#comment-14075423
 ] 

Andrew Purtell commented on HBASE-11591:


Thanks for the clarification!

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-25 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074880#comment-14074880
 ] 

Andrew Purtell commented on HBASE-11591:


Making critical for .5. It seems to me we should be respecting the file level 
sequence in 0.98 as we did in 0.96, and not doing so is a bulk loading bug. 
Feel free to adjust priority downward if you disagree.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0, 0.98.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0, 0.98.5

 Attachments: TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)