[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-09-01 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0

Committed to branch-1 also.  Thanks for the reminder [~anoop.hbase]

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0, 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, HBASE-11591_6.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-09-01 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---
Attachment: HBASE-11591_branch-1-addendum.patch

Addendum patch that solves HBASE-11834

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0, 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, HBASE-11591_6.patch, 
 HBASE-11591_branch-1-addendum.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-26 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Attachment: HBASE-11591_6.patch

Retry QA

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, HBASE-11591_6.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-26 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Open  (was: Patch Available)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, HBASE-11591_6.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-26 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Patch Available  (was: Open)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, HBASE-11591_6.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-26 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to master.
Thanks for the reviews Jeffrey, Ted, Jerry and Anoop. 
The failed QA test seems unrelated and also the java doc warnings.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, HBASE-11591_6.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-25 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Open  (was: Patch Available)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 TestBulkload.java, hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-25 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Patch Available  (was: Open)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, TestBulkload.java, hbase-11591-03-02.patch, 
 hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-25 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Attachment: HBASE-11591_6.patch

Retry QA.  Updated patch.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 HBASE-11591_6.patch, TestBulkload.java, hbase-11591-03-02.patch, 
 hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-21 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Open  (was: Patch Available)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-21 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Attachment: HBASE-11591_5.patch

The patch tries to do skipKVsGreaterthanReadPt() even for bulk loaded kvs. 
In the attached patch if we don't do the mvcc thing for no flush result then 
the testBulkLoad will fail.  That is because though the scanner is created 
after bulk load the read pt is still lower than the seqID created as the seqId 
is not added to the mvcc writeentry.  
In the next case testBulkLoadWithParallelScan() the scanner is created before 
bulk load.  And so the expected KV should be the KV that is not from the bulk 
loaded file, though the scanner heap is reset after the bulk load.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 TestBulkload.java, hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-21 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Patch Available  (was: Open)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, HBASE-11591_5.patch, 
 TestBulkload.java, hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-20 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-11591:
--

Attachment: hbase-11591-03-02.patch

The v2 addressed unit test failures. [~ram_krish] If you don't mind, please 
take a final push on this JIRA since you had spent some time on this issue. 
Thanks.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-02.patch, hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-19 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-11591:
--

Attachment: hbase-11591-03-jeff.patch

[~ram_krish] I've created a patch based on your v3 for your considerations. The 
patch sets SeqId(mvcc) value during read time for bulk loaded files, the 
changes are smaller  less impact and passed the unit test you created.

Another option is that we can add the logic into HFileReader code but will 
involve some code cleaning and we could do it later in that way. Thanks.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java, 
 hbase-11591-03-jeff.patch


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Open  (was: Patch Available)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Patch Available  (was: Open)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-18 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Attachment: HBASE-11591_3.patch

Updated patch.  Tries to set the sequenceId of the bulk loaded file to the kv 
that is retrieved from the bulk loaded file.
Other thing to be noted is that
In the KVScannerComparator.compare() the code would not reach I think because
{code}
else if (leftSequenceID  rightSequenceID) {
{code}
always the list of Storefiles are sorted based on the seqId.  So if we have a 
the seqId of the storefiles as 15, 19, 21 then while creating the KVHeap
{code}
for (KeyValueScanner scanner : scanners) {
if (scanner.peek() != null) {
  this.heap.add(scanner);
} else {
  scanner.close();
}
  }
{code}
So it will try to add 15, 19 and then 21. The compare() will in 
KVScannercomparator will be called from PriorityQueue
{code}
private void siftUpUsingComparator(int k, E x) {
while (k  0) {
int parent = (k - 1)  1;
Object e = queue[parent];
if (comparator.compare(x, (E) e) = 0)
break;
queue[k] = e;
k = parent;
}
queue[k] = x;
}
{code}
Here we can see that the left hand side is always the element that we are 
trying to add and the right hand side is the existing one in the heap.  Since 
the list is always sorted (15, 19 and 21) so the compare will compare LHS=19 
and RHS=15 and then LHS=21 and RHS=19.  So i think the leftSequenceID will 
always be bigger.  Anyway added the condition of setting the sequenceId on the 
rightKV also.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-17 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Open  (was: Patch Available)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-17 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Patch Available  (was: Open)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-17 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Attachment: HBASE-11591_2.patch

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-16 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Attachment: HBASE-11591_1.patch

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-16 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Open  (was: Patch Available)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-08-16 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Patch Available  (was: Open)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-30 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Status: Patch Available  (was: Open)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-30 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Attachment: HBASE-11591.patch

Attaching a patch to get feedback.  Checking on some more corner cases.  

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-26 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Affects Version/s: (was: 0.98.4)
Fix Version/s: (was: 0.98.5)

Removing fix versions as 0.98+ as this problem does not exist there. Sorry for 
specifying the wrong version  and the false alarm on 0.98.
Thanks to Ted for the heads up in this.

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-25 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-11591:
---

Attachment: TestBulkload.java

Use this testcase in 0.98/trunk and 0.96.  For running in 0.96 pls comment out 
the line
{code}
HFileContext context = new HFileContext();
{code}
and change 
{code}
HFile.Writer writer = wf.withPath(fs, path).withFileContext(context).create();
{code}
to 
{code}
HFile.Writer writer = wf.withPath(fs, path).create();
{code}

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0, 0.98.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.99.0, 0.98.5

 Attachments: TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

2014-07-25 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11591:
---

Priority: Critical  (was: Major)

 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0, 0.98.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0, 0.98.5

 Attachments: TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)