[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100310#comment-14100310 ] Anoop Sam John commented on HBASE-11591: Sure. Some quick comments after a glance at the patch isBulkLoadResult - isBulkLoaded()? For setter also? I see this isBulkLoadResult () in StoreFile.java level also. I would have been better to know this status from StoreFile rather than from StoreFileReader. Also what abt compacting a flush file and a bulk loaded one? Will we have issues then? This patch will handle that also? Mind adding tests around that also. compareWithoutMvcc(Cell left, Cell right) Now we have deprecated *mvcc () methods. Suggest change in name here also. bq.// TODO : While doing cells this is should be avoided in the read path. IMHO we should not do this KeyValueUtil.ensureKeyValue() stuff from now. (In read path mainly) In near future we will want Cells in read path. How we can solve this particular issue then? (We can not add setter in Cell.java I believe) Or do we need an extension interface for Cell *in server side* which is having the setter? Doing a deeper look Ram. Sorry for being late. Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie in favor // of the key which came latest. long leftSequenceID = left.getSequenceID(); long rightSequenceID = right.getSequenceID(); if (leftSequenceID rightSequenceID) { return -1; } else if (leftSequenceID rightSequenceID) { return 1; } else { return 0; } } } {code} Here in 0.96 case the mvcc of the cell in both the files will have 0 and so the comparison will happen from the else condition . Where the seq id of the bulk loaded file is greater and would sort out first ensuring that the scan happens from that bulk loaded file. In case of 0.98+ as we are retaining the mvcc+seqid we are not making the mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out the cell in the flushed/compacted file. Which means though we know the lateset file is the bulk loaded file we don't scan the data. Seems to be a behaviour change. Will check on other corner cases also but we are trying to know the behaviour of bulk load because we are evaluating if it can be used for MOB design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11728: --- Status: Open (was: Patch Available) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.98.4, 0.96.1.1 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11728: --- Status: Patch Available (was: Open) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.98.4, 0.96.1.1 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11728: --- Attachment: HBASE-11728_3.patch Updated the category on the test patch. Also removed the syso in the test case and converted them to assertEquals. Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.96.1.1, 0.98.4 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11768) Register region server in zookeeper by ip address
Cheney Sun created HBASE-11768: -- Summary: Register region server in zookeeper by ip address Key: HBASE-11768 URL: https://issues.apache.org/jira/browse/HBASE-11768 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0 Reporter: Cheney Sun HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. This causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11768) Register region server in zookeeper by ip address
[ https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheney Sun updated HBASE-11768: --- Description: HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. However, this causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. was: HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. This causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. Register region server in zookeeper by ip address - Key: HBASE-11768 URL: https://issues.apache.org/jira/browse/HBASE-11768 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0 Reporter: Cheney Sun HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. However, this causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100361#comment-14100361 ] Anoop Sam John commented on HBASE-11591: {code} + if(bulkLoad) { +// TODO : While doing cells this is should be avoided in the read path. +KeyValue leftKV = KeyValueUtil.ensureKeyValue(left.peek()); +KeyValue rightKV = KeyValueUtil.ensureKeyValue(right.peek()); +if(leftKV.getSequenceId() == 0) { + leftKV.setSequenceId(rightKV.getSequenceId()); +} else { + rightKV.setSequenceId(leftKV.getSequenceId()); +} + } {code} So what do we do here Ram? I think we need to set KV seqId for KVs, from bulk loaded file, to the file seqId (which we get from that file name). So instead of this set seqId of one KV to other (which looks hacky IMO) can we do the set by the seqId of the file? Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie in favor // of the key which came latest. long leftSequenceID = left.getSequenceID(); long rightSequenceID = right.getSequenceID(); if (leftSequenceID rightSequenceID) { return -1; } else if (leftSequenceID rightSequenceID) { return 1; } else { return 0; } } } {code} Here in 0.96 case the mvcc of the cell in both the files will have 0 and so the comparison will happen from the else condition . Where the seq id of the bulk loaded file is greater and would sort out first ensuring that the scan happens from that bulk loaded file. In case of 0.98+ as we are retaining the mvcc+seqid we are not making the mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out the cell in the flushed/compacted file. Which means though we know the lateset file is the bulk loaded file we don't scan the data. Seems to be a behaviour change. Will check on other corner cases also but we are trying to know the behaviour of bulk load because we are evaluating if it can be used for MOB design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11768) Register region server in zookeeper by ip address
[ https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheney Sun updated HBASE-11768: --- Attachment: HBASE_11768.patch I like to provide one patch for review. This patch is rather straightforward, which add one option hbase.regionserver.use.ip to control whether to use ip or hostname in zookeeper. By default, the value is false, to leave the current behavior unchanged. If set the value to true, regionserver ip instead of its hostname registered under the HBASE_ROOT/rs/ip.xx.xxx. Register region server in zookeeper by ip address - Key: HBASE-11768 URL: https://issues.apache.org/jira/browse/HBASE-11768 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0 Reporter: Cheney Sun Attachments: HBASE_11768.patch HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. However, this causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100372#comment-14100372 ] Hadoop QA commented on HBASE-11591: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662425/HBASE-11591_2.patch against trunk revision . ATTACHMENT ID: 12662425 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10473//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10473//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10473//console This message is automatically generated. Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie in favor // of the key which came latest. long leftSequenceID =
[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100384#comment-14100384 ] ramkrishna.s.vasudevan commented on HBASE-11591: I got a clean QA run. bq.isBulkLoadResult - isBulkLoaded()? For setter also? Okie. Fine with that. bq.I see this isBulkLoadResult () in StoreFile.java level also. I would have been better to know this status from StoreFile rather than from StoreFileReader. I spent some time for doing it. Later decided this way.First thing is that only the reader is passed to the StoreFileScanner and storefilescanner only has a reader associated with it. So if we need to have this informaiton from Storefile then i need to change the constructor of StoreFileScanner or use a setter. I thought that was making the patch heavier. Also in this case the information of bulk load or not has to be passed from the reader (because the reader reads the file info) and then set that on the Storefile. Currently reader is also an inner class of StoreFile. Considering all this i just kept the new getter/setter in the Reader level. bq.compareWithoutMvcc Okie. bq.IMHO we should not do this KeyValueUtil.ensureKeyValue() stuff from now Yes.. But i think that we should do in a separete JIRA infact to avoid this setSeqId but doing KeyValueUtil.ensureKeyValue(). bq.I think we need to set KV seqId for KVs, from bulk loaded file, to the file seqId Yes.. I did set the other KV's sequence id because I wanted to ensure that we return one of the KVs from the two of them that are contesting here and ensure that we return a KV like what would have been returned if there was no clash and the lastest one was from the flushed file. Anyway before changing this let me check some more cases. Then would update the patch accordingly. Infact I had set the sequenceId of the file and later changed it to this way. Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie in favor // of the key which came latest. long leftSequenceID = left.getSequenceID(); long rightSequenceID = right.getSequenceID(); if (leftSequenceID rightSequenceID) { return -1; } else if (leftSequenceID rightSequenceID) { return 1; } else { return 0; } } } {code} Here in 0.96 case the mvcc of the cell in both the files will have 0 and so the comparison will happen from the else condition . Where the seq id of the bulk loaded file is greater and would sort out first ensuring that the scan happens from that bulk loaded file. In case of 0.98+ as we are retaining the mvcc+seqid we are not making the mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out the cell in the flushed/compacted file. Which means though we know the lateset file is the bulk loaded file we don't scan the data. Seems to be a behaviour change. Will check on other corner cases also but we are trying to know the behaviour of bulk load because we are
[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100392#comment-14100392 ] ramkrishna.s.vasudevan commented on HBASE-11591: bq.Also what abt compacting a flush file and a bulk loaded one? Will we have issues then? This patch will handle that also? Mind adding tests around that also. The current test is also compacting the flushed files. Behaviour wise both would be same in 0.99+. Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie in favor // of the key which came latest. long leftSequenceID = left.getSequenceID(); long rightSequenceID = right.getSequenceID(); if (leftSequenceID rightSequenceID) { return -1; } else if (leftSequenceID rightSequenceID) { return 1; } else { return 0; } } } {code} Here in 0.96 case the mvcc of the cell in both the files will have 0 and so the comparison will happen from the else condition . Where the seq id of the bulk loaded file is greater and would sort out first ensuring that the scan happens from that bulk loaded file. In case of 0.98+ as we are retaining the mvcc+seqid we are not making the mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out the cell in the flushed/compacted file. Which means though we know the lateset file is the bulk loaded file we don't scan the data. Seems to be a behaviour change. Will check on other corner cases also but we are trying to know the behaviour of bulk load because we are evaluating if it can be used for MOB design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11553) Abstract visibility label related services into an interface
[ https://issues.apache.org/jira/browse/HBASE-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-11553: --- Attachment: (was: HBASE-11553_V5.patch) Abstract visibility label related services into an interface Key: HBASE-11553 URL: https://issues.apache.org/jira/browse/HBASE-11553 Project: HBase Issue Type: Improvement Components: security Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11553.patch, HBASE-11553.patch, HBASE-11553_V2.patch, HBASE-11553_V3.patch, HBASE-11553_V4.patch, HBASE-11553_V5.patch - storage and retrieval of label dictionary and authentication sets - marshalling and unmarshalling of visibility expression representations in operation attributes and cell tags - management of assignment of authorizations to principals This will allow us to introduce additional serde implementations for visibility expressions, for example storing as strings in some places and compressed/tokenized representation in others in order to support additional use cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11553) Abstract visibility label related services into an interface
[ https://issues.apache.org/jira/browse/HBASE-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-11553: --- Status: Open (was: Patch Available) Abstract visibility label related services into an interface Key: HBASE-11553 URL: https://issues.apache.org/jira/browse/HBASE-11553 Project: HBase Issue Type: Improvement Components: security Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11553.patch, HBASE-11553.patch, HBASE-11553_V2.patch, HBASE-11553_V3.patch, HBASE-11553_V4.patch, HBASE-11553_V5.patch - storage and retrieval of label dictionary and authentication sets - marshalling and unmarshalling of visibility expression representations in operation attributes and cell tags - management of assignment of authorizations to principals This will allow us to introduce additional serde implementations for visibility expressions, for example storing as strings in some places and compressed/tokenized representation in others in order to support additional use cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11553) Abstract visibility label related services into an interface
[ https://issues.apache.org/jira/browse/HBASE-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-11553: --- Status: Patch Available (was: Open) Abstract visibility label related services into an interface Key: HBASE-11553 URL: https://issues.apache.org/jira/browse/HBASE-11553 Project: HBase Issue Type: Improvement Components: security Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11553.patch, HBASE-11553.patch, HBASE-11553_V2.patch, HBASE-11553_V3.patch, HBASE-11553_V4.patch, HBASE-11553_V5.patch - storage and retrieval of label dictionary and authentication sets - marshalling and unmarshalling of visibility expression representations in operation attributes and cell tags - management of assignment of authorizations to principals This will allow us to introduce additional serde implementations for visibility expressions, for example storing as strings in some places and compressed/tokenized representation in others in order to support additional use cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11553) Abstract visibility label related services into an interface
[ https://issues.apache.org/jira/browse/HBASE-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-11553: --- Attachment: HBASE-11553_V5.patch Abstract visibility label related services into an interface Key: HBASE-11553 URL: https://issues.apache.org/jira/browse/HBASE-11553 Project: HBase Issue Type: Improvement Components: security Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11553.patch, HBASE-11553.patch, HBASE-11553_V2.patch, HBASE-11553_V3.patch, HBASE-11553_V4.patch, HBASE-11553_V5.patch - storage and retrieval of label dictionary and authentication sets - marshalling and unmarshalling of visibility expression representations in operation attributes and cell tags - management of assignment of authorizations to principals This will allow us to introduce additional serde implementations for visibility expressions, for example storing as strings in some places and compressed/tokenized representation in others in order to support additional use cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11553) Abstract visibility label related services into an interface
[ https://issues.apache.org/jira/browse/HBASE-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100457#comment-14100457 ] Hadoop QA commented on HBASE-11553: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662447/HBASE-11553_V5.patch against trunk revision . ATTACHMENT ID: 12662447 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10475//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10475//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10475//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10475//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10475//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10475//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10475//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10475//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10475//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10475//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10475//console This message is automatically generated. Abstract visibility label related services into an interface Key: HBASE-11553 URL: https://issues.apache.org/jira/browse/HBASE-11553 Project: HBase Issue Type: Improvement Components: security Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11553.patch, HBASE-11553.patch, HBASE-11553_V2.patch, HBASE-11553_V3.patch, HBASE-11553_V4.patch, HBASE-11553_V5.patch - storage and retrieval of label dictionary and authentication sets - marshalling and unmarshalling of visibility expression representations in operation attributes and cell tags - management of assignment of authorizations to principals This will allow us to introduce additional serde implementations for visibility expressions, for example storing as strings in some places and compressed/tokenized representation in others in order to support additional use cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11591: --- Status: Open (was: Patch Available) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie in favor // of the key which came latest. long leftSequenceID = left.getSequenceID(); long rightSequenceID = right.getSequenceID(); if (leftSequenceID rightSequenceID) { return -1; } else if (leftSequenceID rightSequenceID) { return 1; } else { return 0; } } } {code} Here in 0.96 case the mvcc of the cell in both the files will have 0 and so the comparison will happen from the else condition . Where the seq id of the bulk loaded file is greater and would sort out first ensuring that the scan happens from that bulk loaded file. In case of 0.98+ as we are retaining the mvcc+seqid we are not making the mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out the cell in the flushed/compacted file. Which means though we know the lateset file is the bulk loaded file we don't scan the data. Seems to be a behaviour change. Will check on other corner cases also but we are trying to know the behaviour of bulk load because we are evaluating if it can be used for MOB design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11757) Provide a common base abstract class for both RegionObserver and MasterObserver
[ https://issues.apache.org/jira/browse/HBASE-11757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-11757: Attachment: HBASE-11757-v0.patch HBASE-11757-0.98-v0.patch Provide a common base abstract class for both RegionObserver and MasterObserver --- Key: HBASE-11757 URL: https://issues.apache.org/jira/browse/HBASE-11757 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Matteo Bertozzi Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11757-0.98-v0.patch, HBASE-11757-v0.patch Some security coprocessors extend both RegionObserver and MasterObserver, unfortunately only one of the two can use the available base abstract class implementations. Provide a common base abstract class for both the RegionObserver and MasterObserver interfaces. Update current coprocessors that extend both interfaces to use the new common base abstract class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11757) Provide a common base abstract class for both RegionObserver and MasterObserver
[ https://issues.apache.org/jira/browse/HBASE-11757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-11757: Status: Patch Available (was: Open) Provide a common base abstract class for both RegionObserver and MasterObserver --- Key: HBASE-11757 URL: https://issues.apache.org/jira/browse/HBASE-11757 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Matteo Bertozzi Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11757-0.98-v0.patch, HBASE-11757-v0.patch Some security coprocessors extend both RegionObserver and MasterObserver, unfortunately only one of the two can use the available base abstract class implementations. Provide a common base abstract class for both the RegionObserver and MasterObserver interfaces. Update current coprocessors that extend both interfaces to use the new common base abstract class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11591: --- Status: Patch Available (was: Open) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie in favor // of the key which came latest. long leftSequenceID = left.getSequenceID(); long rightSequenceID = right.getSequenceID(); if (leftSequenceID rightSequenceID) { return -1; } else if (leftSequenceID rightSequenceID) { return 1; } else { return 0; } } } {code} Here in 0.96 case the mvcc of the cell in both the files will have 0 and so the comparison will happen from the else condition . Where the seq id of the bulk loaded file is greater and would sort out first ensuring that the scan happens from that bulk loaded file. In case of 0.98+ as we are retaining the mvcc+seqid we are not making the mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out the cell in the flushed/compacted file. Which means though we know the lateset file is the bulk loaded file we don't scan the data. Seems to be a behaviour change. Will check on other corner cases also but we are trying to know the behaviour of bulk load because we are evaluating if it can be used for MOB design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11591: --- Attachment: HBASE-11591_3.patch Updated patch. Tries to set the sequenceId of the bulk loaded file to the kv that is retrieved from the bulk loaded file. Other thing to be noted is that In the KVScannerComparator.compare() the code would not reach I think because {code} else if (leftSequenceID rightSequenceID) { {code} always the list of Storefiles are sorted based on the seqId. So if we have a the seqId of the storefiles as 15, 19, 21 then while creating the KVHeap {code} for (KeyValueScanner scanner : scanners) { if (scanner.peek() != null) { this.heap.add(scanner); } else { scanner.close(); } } {code} So it will try to add 15, 19 and then 21. The compare() will in KVScannercomparator will be called from PriorityQueue {code} private void siftUpUsingComparator(int k, E x) { while (k 0) { int parent = (k - 1) 1; Object e = queue[parent]; if (comparator.compare(x, (E) e) = 0) break; queue[k] = e; k = parent; } queue[k] = x; } {code} Here we can see that the left hand side is always the element that we are trying to add and the right hand side is the existing one in the heap. Since the list is always sorted (15, 19 and 21) so the compare will compare LHS=19 and RHS=15 and then LHS=21 and RHS=19. So i think the leftSequenceID will always be bigger. Anyway added the condition of setting the sequenceId on the rightKV also. Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie in favor // of the key which came latest. long leftSequenceID = left.getSequenceID(); long rightSequenceID = right.getSequenceID(); if (leftSequenceID rightSequenceID) { return -1; } else if (leftSequenceID rightSequenceID) { return 1; } else { return 0; } } } {code} Here in 0.96 case the mvcc of the cell in both the files will have 0 and so the comparison will happen from the else condition . Where the seq id of the bulk loaded file is greater and would sort out first ensuring that the scan happens from that bulk loaded file. In case of 0.98+ as we are retaining the mvcc+seqid we are not making the mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out the cell in the flushed/compacted file. Which means though we know the lateset file is the bulk loaded file we don't scan the data. Seems to be a behaviour change. Will check on other corner cases also but we are trying to know the behaviour of bulk load because we are evaluating if it can be used for MOB design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11553) Abstract visibility label related services into an interface
[ https://issues.apache.org/jira/browse/HBASE-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100501#comment-14100501 ] ramkrishna.s.vasudevan commented on HBASE-11553: Just 2 minor nits in RB. Rest looks great. +1 from me. Abstract visibility label related services into an interface Key: HBASE-11553 URL: https://issues.apache.org/jira/browse/HBASE-11553 Project: HBase Issue Type: Improvement Components: security Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11553.patch, HBASE-11553.patch, HBASE-11553_V2.patch, HBASE-11553_V3.patch, HBASE-11553_V4.patch, HBASE-11553_V5.patch - storage and retrieval of label dictionary and authentication sets - marshalling and unmarshalling of visibility expression representations in operation attributes and cell tags - management of assignment of authorizations to principals This will allow us to introduce additional serde implementations for visibility expressions, for example storing as strings in some places and compressed/tokenized representation in others in order to support additional use cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11769) Truncate table shouldn't revoke user privileges
hongyu bi created HBASE-11769: - Summary: Truncate table shouldn't revoke user privileges Key: HBASE-11769 URL: https://issues.apache.org/jira/browse/HBASE-11769 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.94.15 Reporter: hongyu bi hbase(main):002:0 create 'a','cf' 0 row(s) in 0.2500 seconds = Hbase::Table - a hbase(main):003:0 grant 'usera','R','a' 0 row(s) in 0.2080 seconds hbase(main):007:0 user_permission 'a' User Table,Family,Qualifier:Permission usera a,,: [Permission: actions=READ] hbase(main):004:0 truncate 'a' Truncating 'a' table (it may take a while): - Disabling table... - Dropping table... - Creating table... 0 row(s) in 1.5320 seconds hbase(main):005:0 user_permission 'a' User Table,Family,Qualifier:Permission -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11757) Provide a common base abstract class for both RegionObserver and MasterObserver
[ https://issues.apache.org/jira/browse/HBASE-11757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100519#comment-14100519 ] Anoop Sam John commented on HBASE-11757: +1 Provide a common base abstract class for both RegionObserver and MasterObserver --- Key: HBASE-11757 URL: https://issues.apache.org/jira/browse/HBASE-11757 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Matteo Bertozzi Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11757-0.98-v0.patch, HBASE-11757-v0.patch Some security coprocessors extend both RegionObserver and MasterObserver, unfortunately only one of the two can use the available base abstract class implementations. Provide a common base abstract class for both the RegionObserver and MasterObserver interfaces. Update current coprocessors that extend both interfaces to use the new common base abstract class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100539#comment-14100539 ] Hadoop QA commented on HBASE-11591: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662463/HBASE-11591_3.patch against trunk revision . ATTACHMENT ID: 12662463 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestRegionRebalancing Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10477//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10477//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10477//console This message is automatically generated. Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie
[jira] [Updated] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11728: --- Attachment: HBASE-11728_4.patch Retry QA. Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.96.1.1, 0.98.4 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11728: --- Status: Open (was: Patch Available) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.98.4, 0.96.1.1 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11728: --- Status: Patch Available (was: Open) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.98.4, 0.96.1.1 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11728: --- Fix Version/s: (was: 0.94.23) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.96.1.1, 0.98.4 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11728: --- Fix Version/s: 0.94.23 Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.96.1.1, 0.98.4 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11769) Truncate table shouldn't revoke user privileges
[ https://issues.apache.org/jira/browse/HBASE-11769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100546#comment-14100546 ] Jean-Marc Spaggiari commented on HBASE-11769: - Make sense to me. There is also a truncate which preserves the splits. You might want to modify this one too. (truncate_preserve) Truncate table shouldn't revoke user privileges --- Key: HBASE-11769 URL: https://issues.apache.org/jira/browse/HBASE-11769 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.94.15 Reporter: hongyu bi hbase(main):002:0 create 'a','cf' 0 row(s) in 0.2500 seconds = Hbase::Table - a hbase(main):003:0 grant 'usera','R','a' 0 row(s) in 0.2080 seconds hbase(main):007:0 user_permission 'a' User Table,Family,Qualifier:Permission usera a,,: [Permission: actions=READ] hbase(main):004:0 truncate 'a' Truncating 'a' table (it may take a while): - Disabling table... - Dropping table... - Creating table... 0 row(s) in 1.5320 seconds hbase(main):005:0 user_permission 'a' User Table,Family,Qualifier:Permission -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11768) Register region server in zookeeper by ip address
[ https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100548#comment-14100548 ] Jean-Marc Spaggiari commented on HBASE-11768: - Indeed, pretty simple patch. Have you tested it in a real cluster? Register region server in zookeeper by ip address - Key: HBASE-11768 URL: https://issues.apache.org/jira/browse/HBASE-11768 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0 Reporter: Cheney Sun Attachments: HBASE_11768.patch HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. However, this causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11757) Provide a common base abstract class for both RegionObserver and MasterObserver
[ https://issues.apache.org/jira/browse/HBASE-11757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100553#comment-14100553 ] Hadoop QA commented on HBASE-11757: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662462/HBASE-11757-v0.patch against trunk revision . ATTACHMENT ID: 12662462 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.TestRegionRebalancing org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10476//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10476//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10476//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10476//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10476//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10476//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10476//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10476//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10476//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10476//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10476//console This message is automatically generated. Provide a common base abstract class for both RegionObserver and MasterObserver --- Key: HBASE-11757 URL: https://issues.apache.org/jira/browse/HBASE-11757 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Matteo Bertozzi Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11757-0.98-v0.patch, HBASE-11757-v0.patch Some security coprocessors extend both RegionObserver and MasterObserver, unfortunately only one of the two can use the available base abstract class implementations. Provide a common base abstract class for both the RegionObserver and MasterObserver interfaces. Update current coprocessors that extend both interfaces to use the new common base abstract class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (HBASE-11743) Add unit test for the fix that sorts custom value of BUCKET_CACHE_BUCKETS_KEY
[ https://issues.apache.org/jira/browse/HBASE-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-11743 started by Gustavo Anatoly. Add unit test for the fix that sorts custom value of BUCKET_CACHE_BUCKETS_KEY - Key: HBASE-11743 URL: https://issues.apache.org/jira/browse/HBASE-11743 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Gustavo Anatoly Priority: Minor HBASE-11550 sorts the custom value of BUCKET_CACHE_BUCKETS_KEY such that there is no wastage in bucket allocation. This JIRA is to add unit test for the fix so that there is no regression in the future. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work stopped] (HBASE-11743) Add unit test for the fix that sorts custom value of BUCKET_CACHE_BUCKETS_KEY
[ https://issues.apache.org/jira/browse/HBASE-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-11743 stopped by Gustavo Anatoly. Add unit test for the fix that sorts custom value of BUCKET_CACHE_BUCKETS_KEY - Key: HBASE-11743 URL: https://issues.apache.org/jira/browse/HBASE-11743 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Gustavo Anatoly Priority: Minor HBASE-11550 sorts the custom value of BUCKET_CACHE_BUCKETS_KEY such that there is no wastage in bucket allocation. This JIRA is to add unit test for the fix so that there is no regression in the future. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11769) Truncate table shouldn't revoke user privileges
[ https://issues.apache.org/jira/browse/HBASE-11769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100559#comment-14100559 ] chendihao commented on HBASE-11769: --- Agree with [~jmspaggi]. Truncate_preserve works well without removing the privilieges. Won't fix, right? Truncate table shouldn't revoke user privileges --- Key: HBASE-11769 URL: https://issues.apache.org/jira/browse/HBASE-11769 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.94.15 Reporter: hongyu bi hbase(main):002:0 create 'a','cf' 0 row(s) in 0.2500 seconds = Hbase::Table - a hbase(main):003:0 grant 'usera','R','a' 0 row(s) in 0.2080 seconds hbase(main):007:0 user_permission 'a' User Table,Family,Qualifier:Permission usera a,,: [Permission: actions=READ] hbase(main):004:0 truncate 'a' Truncating 'a' table (it may take a while): - Disabling table... - Dropping table... - Creating table... 0 row(s) in 1.5320 seconds hbase(main):005:0 user_permission 'a' User Table,Family,Qualifier:Permission -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11769) Truncate table shouldn't revoke user privileges
[ https://issues.apache.org/jira/browse/HBASE-11769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100564#comment-14100564 ] Matteo Bertozzi commented on HBASE-11769: - truncate preserve only preserve the set of region splits. Since the shell does a delete table + create table that will always remove the ACLs HBASE-8332 fixed the problem by adding a truncate API which bypass the delete table/acls. Truncate table shouldn't revoke user privileges --- Key: HBASE-11769 URL: https://issues.apache.org/jira/browse/HBASE-11769 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.94.15 Reporter: hongyu bi hbase(main):002:0 create 'a','cf' 0 row(s) in 0.2500 seconds = Hbase::Table - a hbase(main):003:0 grant 'usera','R','a' 0 row(s) in 0.2080 seconds hbase(main):007:0 user_permission 'a' User Table,Family,Qualifier:Permission usera a,,: [Permission: actions=READ] hbase(main):004:0 truncate 'a' Truncating 'a' table (it may take a while): - Disabling table... - Dropping table... - Creating table... 0 row(s) in 1.5320 seconds hbase(main):005:0 user_permission 'a' User Table,Family,Qualifier:Permission -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11769) Truncate table shouldn't revoke user privileges
[ https://issues.apache.org/jira/browse/HBASE-11769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100575#comment-14100575 ] Jean-Marc Spaggiari commented on HBASE-11769: - Just to be clear, I was not saying that preserve did or did not preserved privileges, was just that we might want to look it too. So for the purpose of this patch, should Honguy simply update ruby scripts to call the new API provided by HBASE-8332? Might be cleaner than having 2 implementations (One in ruby one in java) for the same feature? Truncate table shouldn't revoke user privileges --- Key: HBASE-11769 URL: https://issues.apache.org/jira/browse/HBASE-11769 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.94.15 Reporter: hongyu bi hbase(main):002:0 create 'a','cf' 0 row(s) in 0.2500 seconds = Hbase::Table - a hbase(main):003:0 grant 'usera','R','a' 0 row(s) in 0.2080 seconds hbase(main):007:0 user_permission 'a' User Table,Family,Qualifier:Permission usera a,,: [Permission: actions=READ] hbase(main):004:0 truncate 'a' Truncating 'a' table (it may take a while): - Disabling table... - Dropping table... - Creating table... 0 row(s) in 1.5320 seconds hbase(main):005:0 user_permission 'a' User Table,Family,Qualifier:Permission -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11761) Add a FAQ item for updating a maven-managed application from 0.94 - 0.96+
[ https://issues.apache.org/jira/browse/HBASE-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100648#comment-14100648 ] Sean Busbey commented on HBASE-11761: - I was thinking the FAQ for a couple of reasons: 1) It also applies to 0.94 - 0.96 upgrades 2) I suspect developers of downstream clients are more likely to notice a FAQ item geared towards them than an addition to the upgrade docs. Maybe a FAQ item with a pointer within both the 0.94 - 0.96 and 0.94 - 0.98 upgrade sections? Add a FAQ item for updating a maven-managed application from 0.94 - 0.96+ -- Key: HBASE-11761 URL: https://issues.apache.org/jira/browse/HBASE-11761 Project: HBase Issue Type: Task Components: documentation Reporter: Sean Busbey Labels: beginner In 0.96 we changed artifact structure, so that clients need to rely on an artifact specific to some module (hopefully hbase-client) instead of a single fat jar. We should add a FAQ item that points people towards hbase-client, to ease those updating downstream applications from 0.94 to 0.98+. Showing an example pom entry for e.g. org.apache.hbase:hbase:0.94.22 and one for e.g. org.apache.hbase:hbase-client:0.98.5 should be sufficient. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11770) TestBlockCacheReporting.testBucketCache is not stable
Sergey Soldatov created HBASE-11770: --- Summary: TestBlockCacheReporting.testBucketCache is not stable Key: HBASE-11770 URL: https://issues.apache.org/jira/browse/HBASE-11770 Project: HBase Issue Type: Bug Components: test Environment: kvm box with Ubuntu 12.04 Desktop 64bit. java version 1.7.0_65 Java(TM) SE Runtime Environment (build 1.7.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) Reporter: Sergey Soldatov Assignee: Sergey Soldatov Depending on the machine and OS TestBlockCacheReporting.testBucketCache may fail with NPE: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:417) at org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:80) at org.apache.hadoop.hbase.io.hfile.TestBlockCacheReporting.addDataAndHits(TestBlockCacheReporting.java:67) at org.apache.hadoop.hbase.io.hfile.TestBlockCacheReporting.testBucketCache(TestBlockCacheReporting.java:86) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11165) Scaling so cluster can host 1M regions and beyond (50M regions?)
[ https://issues.apache.org/jira/browse/HBASE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100778#comment-14100778 ] Jimmy Xiang commented on HBASE-11165: - I agree with Matteo on this. One more benefit to have meta and master together is the meta/master recovery will be much simpler (I mean there won't be scenario like master is recovering, meta regionserver may be down). Scaling so cluster can host 1M regions and beyond (50M regions?) Key: HBASE-11165 URL: https://issues.apache.org/jira/browse/HBASE-11165 Project: HBase Issue Type: Brainstorming Reporter: stack Attachments: HBASE-11165.zip, Region Scalability test.pdf, zk_less_assignment_comparison_2.pdf This discussion issue comes out of Co-locate Meta And Master HBASE-10569 and comments on the doc posted there. A user -- our Francis Liu -- needs to be able to scale a cluster to do 1M regions maybe even 50M later. This issue is about discussing how we will do that (or if not 50M on a cluster, how otherwise we can attain same end). More detail to follow. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11610) Enhance remote meta updates
[ https://issues.apache.org/jira/browse/HBASE-11610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100826#comment-14100826 ] Virag Kothari commented on HBASE-11610: --- [~jxiang] Any other comments before we can get this in? Enhance remote meta updates --- Key: HBASE-11610 URL: https://issues.apache.org/jira/browse/HBASE-11610 Project: HBase Issue Type: Sub-task Reporter: Jimmy Xiang Assignee: Virag Kothari Attachments: HBASE-11610.patch Currently, if the meta region is on a regionserver instead of the master, meta update is synchronized on one HTable instance. We should be able to do better. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11610) Enhance remote meta updates
[ https://issues.apache.org/jira/browse/HBASE-11610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100841#comment-14100841 ] Jimmy Xiang commented on HBASE-11610: - I have no more comment. I am ok with a patch that shares just one HConnection and one execution pool, closes the meta htable instance after each use. I am not sure about the current patch. Perhaps [~larsh]/[~nkeywal] can take a look? Enhance remote meta updates --- Key: HBASE-11610 URL: https://issues.apache.org/jira/browse/HBASE-11610 Project: HBase Issue Type: Sub-task Reporter: Jimmy Xiang Assignee: Virag Kothari Attachments: HBASE-11610.patch Currently, if the meta region is on a regionserver instead of the master, meta update is synchronized on one HTable instance. We should be able to do better. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11728) Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11728: --- Summary: Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING (was: Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING) Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.96.1.1, 0.98.4 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100860#comment-14100860 ] Ted Yu commented on HBASE-11591: {code} + * Compares two cells without mvcc + * + * @param left + * @param right + * @return less than 0 if left is smaller, 0 if equal etc.. + */ +public int compareWithoutSeqId(Cell left, Cell right) { {code} Change javadoc to match the method name. Cell is marked @InterfaceStability.Evolving setSequenceId() should be added to Cell interface - in another issue. {code} +public class TestScannerWithBulkload { + private final static HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility(); + private final static String tableName = testBulkload; {code} Please change tableName to match test name. Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie in favor // of the key which came latest. long leftSequenceID = left.getSequenceID(); long rightSequenceID = right.getSequenceID(); if (leftSequenceID rightSequenceID) { return -1; } else if (leftSequenceID rightSequenceID) { return 1; } else { return 0; } } } {code} Here in 0.96 case the mvcc of the cell in both the files will have 0 and so the comparison will happen from the else condition . Where the seq id of the bulk loaded file is greater and would sort out first ensuring that the scan happens from that bulk loaded file. In case of 0.98+ as we are retaining the mvcc+seqid we are not making the mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out the cell in the flushed/compacted file. Which means though we know the lateset file is the bulk loaded file we don't scan the data. Seems to be a behaviour change. Will check on other corner cases also but we are trying to know the behaviour of bulk load because we are evaluating if it can be used for MOB design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100875#comment-14100875 ] ramkrishna.s.vasudevan commented on HBASE-11591: bq.setSequenceId() should be added to Cell interface - in another issue. I don't think we can add setSequenceId() in Cell. We can discuss on that. Will update the patch. Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie in favor // of the key which came latest. long leftSequenceID = left.getSequenceID(); long rightSequenceID = right.getSequenceID(); if (leftSequenceID rightSequenceID) { return -1; } else if (leftSequenceID rightSequenceID) { return 1; } else { return 0; } } } {code} Here in 0.96 case the mvcc of the cell in both the files will have 0 and so the comparison will happen from the else condition . Where the seq id of the bulk loaded file is greater and would sort out first ensuring that the scan happens from that bulk loaded file. In case of 0.98+ as we are retaining the mvcc+seqid we are not making the mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out the cell in the flushed/compacted file. Which means though we know the lateset file is the bulk loaded file we don't scan the data. Seems to be a behaviour change. Will check on other corner cases also but we are trying to know the behaviour of bulk load because we are evaluating if it can be used for MOB design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11657) Put HTable region methods in an interface
[ https://issues.apache.org/jira/browse/HBASE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100876#comment-14100876 ] Carter commented on HBASE-11657: That would certainly solve the problem with {{TableInputFormatBase}}. We should also probably add a _getTableName_ method to {{RegionLocator}}, regardless. Then passing the RL interface instead of a raw HTable object would provide everything that it needs for sharding the MR. A more philosophical question is, why is {{HRegionLocation}} InterfaceAudience.Private to begin with? It is a POJO that wraps {{HRegionInfo}} (InterfaceAudience.Public), {{ServerName}} (InterfaceAudience.Public), and _seqNum_ (an immutable long). It seems to me that either the internal fields should be private too, or HRegionLocation should be public. Unless there is some correlation of that information that shouldn't be exposed. Thoughts, [~stack], [~ndimiduk], [~enis]? Put HTable region methods in an interface - Key: HBASE-11657 URL: https://issues.apache.org/jira/browse/HBASE-11657 Project: HBase Issue Type: Improvement Affects Versions: 0.99.0 Reporter: Carter Assignee: Carter Fix For: 0.99.0 Attachments: HBASE_11657.patch, HBASE_11657_v2.patch, HBASE_11657_v3.patch, HBASE_11657_v3.patch, HBASE_11657_v4.patch Most of the HTable methods are now abstracted by HTableInterface, with the notable exception of the following methods that pertain to region metadata: {code} HRegionLocation getRegionLocation(final String row) HRegionLocation getRegionLocation(final byte [] row) HRegionLocation getRegionLocation(final byte [] row, boolean reload) byte [][] getStartKeys() byte[][] getEndKeys() Pairbyte[][],byte[][] getStartEndKeys() void clearRegionCache() {code} and a default scope method which maybe should be bundled with the others: {code} ListRegionLocations listRegionLocations() {code} Since the consensus seems to be that these would muddy HTableInterface with non-core functionality, where should it go? MapReduce looks up the region boundaries, so it needs to be exposed somewhere. Let me throw out a straw man to start the conversation. I propose: {code} org.apache.hadoop.hbase.client.HRegionInterface {code} Have HTable implement this interface. Also add these methods to HConnection: {code} HRegionInterface getTableRegion(TableName tableName) HRegionInterface getTableRegion(TableName tableName, ExecutorService pool) {code} [~stack], [~ndimiduk], [~enis], thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HBASE-11728) Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100881#comment-14100881 ] ramkrishna.s.vasudevan edited comment on HBASE-11728 at 8/18/14 5:21 PM: - Committed to master, branch-1 and 0.98. Thanks for the review [~mcorgan] was (Author: ram_krish): Committed to master, branch-1 and 0.98. Thanks for the review @mcorgan. Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.96.1.1, 0.98.4 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11728) Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100881#comment-14100881 ] ramkrishna.s.vasudevan commented on HBASE-11728: Committed to master, branch-1 and 0.98. Thanks for the review @mcorgan. Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.96.1.1, 0.98.4 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11728) Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11728: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.96.1.1, 0.98.4 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11766) Backdoor CoprocessorHConnection is no longer being used for local writes
[ https://issues.apache.org/jira/browse/HBASE-11766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11766: --- Fix Version/s: 0.98.6 2.0.0 0.99.0 Assignee: Andrew Purtell Backdoor CoprocessorHConnection is no longer being used for local writes Key: HBASE-11766 URL: https://issues.apache.org/jira/browse/HBASE-11766 Project: HBase Issue Type: Bug Affects Versions: 0.98.4 Reporter: James Taylor Assignee: Andrew Purtell Labels: Phoenix Fix For: 0.99.0, 2.0.0, 0.98.6 There's a backdoor CoprocessorHConnection used to ensure that a batched mutation does not go over the wire and back, but executes immediately locally. This is leveraged by Phoenix during secondary index maintenance (for an ~20% perf improvement). It looks to me like it's no longer used, as the following function is never invoked: public org.apache.hadoop.hbase.protobuf.generated.ClientProtos.ClientService.BlockingInterface getClient(ServerName serverName) throws IOException { It'd be good if feasible to add an HBase unit test to prevent further regressions. For more info, see PHOENIX-1166. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11232) Region fail to release the updatelock for illegal CF in multi row mutations
[ https://issues.apache.org/jira/browse/HBASE-11232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100891#comment-14100891 ] Andrew Purtell commented on HBASE-11232: Ping [~lhofhansl]. Shall we get this in for the next 0.94 release? Region fail to release the updatelock for illegal CF in multi row mutations --- Key: HBASE-11232 URL: https://issues.apache.org/jira/browse/HBASE-11232 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.19 Reporter: Liu Shaohui Assignee: Liu Shaohui Attachments: HBASE-11232-0.94.diff The failback code in processRowsWithLocks did not check the column family. If there is an illegal CF in the muation, it will throw NullPointException and the update lock will not be released. So the region can not be flushed and compacted. HRegion #4946 {code} if (!mutations.isEmpty() !walSyncSuccessful) { LOG.warn(Wal sync failed. Roll back + mutations.size() + memstore keyvalues for row(s): + processor.getRowsToLock().iterator().next() + ...); for (KeyValue kv : mutations) { stores.get(kv.getFamily()).rollback(kv); } } // 11. Roll mvcc forward if (writeEntry != null) { mvcc.completeMemstoreInsert(writeEntry); writeEntry = null; } if (locked) { this.updatesLock.readLock().unlock(); locked = false; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HBASE-11596) [Shell] Recreate table grants after truncate
[ https://issues.apache.org/jira/browse/HBASE-11596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-11596. Resolution: Duplicate I'm going to resolve this as a dup of HBASE-8332 and HBASE-11769 [Shell] Recreate table grants after truncate Key: HBASE-11596 URL: https://issues.apache.org/jira/browse/HBASE-11596 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Priority: Minor Labels: beginner The shell's truncate command disables, drops, and creates a replacement table. When the AccessController is active it observes the drop and cleans up any grants made on the table. The shell does not take any action to preserve the grants but could. Would make a nice improvement. If security is active and running with administrative privilege, the shell could retrieve the table- and CF-level grants before dropping the table and replay them on the new table after creating it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11769) Truncate table shouldn't revoke user privileges
[ https://issues.apache.org/jira/browse/HBASE-11769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100964#comment-14100964 ] Andrew Purtell commented on HBASE-11769: Dup of HBASE-11596 (which itself is at least partially a dup of HBASE-8332, but would need a backport to 0.94) Truncate table shouldn't revoke user privileges --- Key: HBASE-11769 URL: https://issues.apache.org/jira/browse/HBASE-11769 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.94.15 Reporter: hongyu bi hbase(main):002:0 create 'a','cf' 0 row(s) in 0.2500 seconds = Hbase::Table - a hbase(main):003:0 grant 'usera','R','a' 0 row(s) in 0.2080 seconds hbase(main):007:0 user_permission 'a' User Table,Family,Qualifier:Permission usera a,,: [Permission: actions=READ] hbase(main):004:0 truncate 'a' Truncating 'a' table (it may take a while): - Disabling table... - Dropping table... - Creating table... 0 row(s) in 1.5320 seconds hbase(main):005:0 user_permission 'a' User Table,Family,Qualifier:Permission -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11771) Move to log4j 2
Alex Newman created HBASE-11771: --- Summary: Move to log4j 2 Key: HBASE-11771 URL: https://issues.apache.org/jira/browse/HBASE-11771 Project: HBase Issue Type: Improvement Reporter: Alex Newman Assignee: Alex Newman Priority: Minor It seems much faster. Any objections? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HBASE-11771) Move to log4j 2
[ https://issues.apache.org/jira/browse/HBASE-11771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-11771. Resolution: Duplicate Assignee: (was: Alex Newman) Dup of HBASE-10092. Want to take that one over [~posix4e]? Move to log4j 2 --- Key: HBASE-11771 URL: https://issues.apache.org/jira/browse/HBASE-11771 Project: HBase Issue Type: Improvement Reporter: Alex Newman Priority: Minor It seems much faster. Any objections? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11762) Record the class name of Codec in WAL header
[ https://issues.apache.org/jira/browse/HBASE-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100976#comment-14100976 ] Ted Yu commented on HBASE-11762: [~apurtell]: What do you think of patch v4 ? Record the class name of Codec in WAL header Key: HBASE-11762 URL: https://issues.apache.org/jira/browse/HBASE-11762 Project: HBase Issue Type: Task Components: wal Reporter: Ted Yu Assignee: Ted Yu Fix For: 1.0.0, 2.0.0, 0.98.6 Attachments: 11762-v1.txt, 11762-v2.txt, 11762-v4.txt In follow-up discussion to HBASE-11620, Enis brought up this point: Related to this, should not we also write the CellCodec that we use in the WAL header. Right now, the codec comes from the configuration which means that you cannot read back the WAL files if you change the codec. This JIRA is to implement the above suggestion. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11165) Scaling so cluster can host 1M regions and beyond (50M regions?)
[ https://issues.apache.org/jira/browse/HBASE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100953#comment-14100953 ] Andrew Purtell commented on HBASE-11165: bq. I agree with Matteo on this. One more benefit to have meta and master together is the meta/master recovery will be much simpler Do we need to split this conversation into what to do on master and what to do with 0.98? We could for example file two separate subtasks that approach the meta scaling problem in different ways for the respective branches. They are divergent enough so that would be a good idea IMHO Scaling so cluster can host 1M regions and beyond (50M regions?) Key: HBASE-11165 URL: https://issues.apache.org/jira/browse/HBASE-11165 Project: HBase Issue Type: Brainstorming Reporter: stack Attachments: HBASE-11165.zip, Region Scalability test.pdf, zk_less_assignment_comparison_2.pdf This discussion issue comes out of Co-locate Meta And Master HBASE-10569 and comments on the doc posted there. A user -- our Francis Liu -- needs to be able to scale a cluster to do 1M regions maybe even 50M later. This issue is about discussing how we will do that (or if not 50M on a cluster, how otherwise we can attain same end). More detail to follow. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11771) Move to log4j 2
[ https://issues.apache.org/jira/browse/HBASE-11771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100981#comment-14100981 ] Alex Newman commented on HBASE-11771: - Love to Move to log4j 2 --- Key: HBASE-11771 URL: https://issues.apache.org/jira/browse/HBASE-11771 Project: HBase Issue Type: Improvement Reporter: Alex Newman Priority: Minor It seems much faster. Any objections? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10092) Move up on to log4j2
[ https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100982#comment-14100982 ] Alex Newman commented on HBASE-10092: - Mind if I hope on this patch? Move up on to log4j2 Key: HBASE-10092 URL: https://issues.apache.org/jira/browse/HBASE-10092 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch Allows logging with less friction. See http://logging.apache.org/log4j/2.x/ This rather radical transition can be done w/ minor change given they have an adapter for apache's logging, the one we use. They also have and adapter for slf4j so we likely can remove at least some of the 4 versions of this module our dependencies make use of. I made a start in attached patch but am currently stuck in maven dependency resolve hell courtesy of our slf4j. Fixing will take some concentration and a good net connection, an item I currently lack. Other TODOs are that will need to fix our little log level setting jsp page -- will likely have to undo our use of hadoop's tool here -- and the config system changes a little. I will return to this project soon. Will bring numbers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10092) Move up on to log4j2
[ https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10092: --- Assignee: Alex Newman (was: stack) Move up on to log4j2 Key: HBASE-10092 URL: https://issues.apache.org/jira/browse/HBASE-10092 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: Alex Newman Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch Allows logging with less friction. See http://logging.apache.org/log4j/2.x/ This rather radical transition can be done w/ minor change given they have an adapter for apache's logging, the one we use. They also have and adapter for slf4j so we likely can remove at least some of the 4 versions of this module our dependencies make use of. I made a start in attached patch but am currently stuck in maven dependency resolve hell courtesy of our slf4j. Fixing will take some concentration and a good net connection, an item I currently lack. Other TODOs are that will need to fix our little log level setting jsp page -- will likely have to undo our use of hadoop's tool here -- and the config system changes a little. I will return to this project soon. Will bring numbers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10092) Move up on to log4j2
[ https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100994#comment-14100994 ] Andrew Purtell commented on HBASE-10092: Reassigned to [~posix4e]. Good luck! Move up on to log4j2 Key: HBASE-10092 URL: https://issues.apache.org/jira/browse/HBASE-10092 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: Alex Newman Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch Allows logging with less friction. See http://logging.apache.org/log4j/2.x/ This rather radical transition can be done w/ minor change given they have an adapter for apache's logging, the one we use. They also have and adapter for slf4j so we likely can remove at least some of the 4 versions of this module our dependencies make use of. I made a start in attached patch but am currently stuck in maven dependency resolve hell courtesy of our slf4j. Fixing will take some concentration and a good net connection, an item I currently lack. Other TODOs are that will need to fix our little log level setting jsp page -- will likely have to undo our use of hadoop's tool here -- and the config system changes a little. I will return to this project soon. Will bring numbers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11610) Enhance remote meta updates
[ https://issues.apache.org/jira/browse/HBASE-11610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101043#comment-14101043 ] stack commented on HBASE-11610: --- This patch is a bit of a hack. We are doing a one-off inside RegionStateStore to put up multiple HConnection instances (for sure we are creating many distinct instances?). I'd doubt anyone but you fellas will know of its existance (Needs a release not on the new config, hbase.statestore.meta.connection, and new config should probably be called hbase.regionstatestore.meta.connection). Would be nice if this connection setup was off in a separate class so should anyone else want to do this trick, they'll not duplicate your effort. This is just a nit though. I'm also fine with adding in stuff that is custom for you fellas (custom for now) just as long as it is well doc'd. When would this code trigger? + if (hConnectionPool == null) { +hConnectionPool = new HConnection[]{HConnectionManager.createConnection(server.getConfiguration())}; + } i.e. when would hConnectionPool be null? Should this be private? + private ThreadLocalHTableInterface threadLocalHTable = It should have a comment on when this thread local gets instantiated -- what the current thread is at the time. Enhance remote meta updates --- Key: HBASE-11610 URL: https://issues.apache.org/jira/browse/HBASE-11610 Project: HBase Issue Type: Sub-task Reporter: Jimmy Xiang Assignee: Virag Kothari Attachments: HBASE-11610.patch Currently, if the meta region is on a regionserver instead of the master, meta update is synchronized on one HTable instance. We should be able to do better. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10092) Move up on to log4j2
[ https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101046#comment-14101046 ] stack commented on HBASE-10092: --- I think this a hbase 2.0 issue now. Log config format changes which will be too much to take on in a 1.0 hbase (IMO). Move up on to log4j2 Key: HBASE-10092 URL: https://issues.apache.org/jira/browse/HBASE-10092 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: Alex Newman Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch Allows logging with less friction. See http://logging.apache.org/log4j/2.x/ This rather radical transition can be done w/ minor change given they have an adapter for apache's logging, the one we use. They also have and adapter for slf4j so we likely can remove at least some of the 4 versions of this module our dependencies make use of. I made a start in attached patch but am currently stuck in maven dependency resolve hell courtesy of our slf4j. Fixing will take some concentration and a good net connection, an item I currently lack. Other TODOs are that will need to fix our little log level setting jsp page -- will likely have to undo our use of hadoop's tool here -- and the config system changes a little. I will return to this project soon. Will bring numbers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11761) Add a FAQ item for updating a maven-managed application from 0.94 - 0.96+
[ https://issues.apache.org/jira/browse/HBASE-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101061#comment-14101061 ] stack commented on HBASE-11761: --- bq. Maybe a FAQ item with a pointer within both the 0.94 - 0.96 and 0.94 - 0.98 upgrade sections? Sounds great. Add a FAQ item for updating a maven-managed application from 0.94 - 0.96+ -- Key: HBASE-11761 URL: https://issues.apache.org/jira/browse/HBASE-11761 Project: HBase Issue Type: Task Components: documentation Reporter: Sean Busbey Labels: beginner In 0.96 we changed artifact structure, so that clients need to rely on an artifact specific to some module (hopefully hbase-client) instead of a single fat jar. We should add a FAQ item that points people towards hbase-client, to ease those updating downstream applications from 0.94 to 0.98+. Showing an example pom entry for e.g. org.apache.hbase:hbase:0.94.22 and one for e.g. org.apache.hbase:hbase-client:0.98.5 should be sufficient. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11770) TestBlockCacheReporting.testBucketCache is not stable
[ https://issues.apache.org/jira/browse/HBASE-11770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101057#comment-14101057 ] stack commented on HBASE-11770: --- Want to assign to me [~sergey.soldatov] Its a test of my writing. TestBlockCacheReporting.testBucketCache is not stable -- Key: HBASE-11770 URL: https://issues.apache.org/jira/browse/HBASE-11770 Project: HBase Issue Type: Bug Components: test Environment: kvm box with Ubuntu 12.04 Desktop 64bit. java version 1.7.0_65 Java(TM) SE Runtime Environment (build 1.7.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) Reporter: Sergey Soldatov Assignee: Sergey Soldatov Depending on the machine and OS TestBlockCacheReporting.testBucketCache may fail with NPE: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:417) at org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:80) at org.apache.hadoop.hbase.io.hfile.TestBlockCacheReporting.addDataAndHits(TestBlockCacheReporting.java:67) at org.apache.hadoop.hbase.io.hfile.TestBlockCacheReporting.testBucketCache(TestBlockCacheReporting.java:86) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10092) Move up on to log4j2
[ https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101073#comment-14101073 ] Andrew Purtell commented on HBASE-10092: bq. Log config format changes which will be too much to take on in a 1.0 hbase (IMO). Was thinking about this also. So let me just propose it then.. What about putting in a log configuration file adapter so we don't have to change our log4j properties files until later? This would be needed if we ever wanted to backport async logging improvements to something like 0.98. Move up on to log4j2 Key: HBASE-10092 URL: https://issues.apache.org/jira/browse/HBASE-10092 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: Alex Newman Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch Allows logging with less friction. See http://logging.apache.org/log4j/2.x/ This rather radical transition can be done w/ minor change given they have an adapter for apache's logging, the one we use. They also have and adapter for slf4j so we likely can remove at least some of the 4 versions of this module our dependencies make use of. I made a start in attached patch but am currently stuck in maven dependency resolve hell courtesy of our slf4j. Fixing will take some concentration and a good net connection, an item I currently lack. Other TODOs are that will need to fix our little log level setting jsp page -- will likely have to undo our use of hadoop's tool here -- and the config system changes a little. I will return to this project soon. Will bring numbers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file
[ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101080#comment-14101080 ] Andrew Purtell commented on HBASE-11591: bq. But setting the seqId on the read path would prevent us from using Cell based impl because Cell does not have it. What prevents us from adding seqID accessors as an additional interface extending Cell in hbase-server as Anoop proposed above? Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file --- Key: HBASE-11591 URL: https://issues.apache.org/jira/browse/HBASE-11591 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java See discussion in HBASE-11339. When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. Both the files have some same kvs which matches even in timestamp. Steps: Add some rows with a specific timestamp and flush the same. Bulk load a file with the same data.. Enusre that assign seqnum property is set. The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). This would ensure that the bulk loaded file has the highest seq num. Assume the cell in the flushed/compacted store file is row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is row1,cf,cq,ts1,value2 (There are no parallel scans). Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. This is a behaviour change. This is because of this code {code} public int compare(KeyValueScanner left, KeyValueScanner right) { int comparison = compare(left.peek(), right.peek()); if (comparison != 0) { return comparison; } else { // Since both the keys are exactly the same, we break the tie in favor // of the key which came latest. long leftSequenceID = left.getSequenceID(); long rightSequenceID = right.getSequenceID(); if (leftSequenceID rightSequenceID) { return -1; } else if (leftSequenceID rightSequenceID) { return 1; } else { return 0; } } } {code} Here in 0.96 case the mvcc of the cell in both the files will have 0 and so the comparison will happen from the else condition . Where the seq id of the bulk loaded file is greater and would sort out first ensuring that the scan happens from that bulk loaded file. In case of 0.98+ as we are retaining the mvcc+seqid we are not making the mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out the cell in the flushed/compacted file. Which means though we know the lateset file is the bulk loaded file we don't scan the data. Seems to be a behaviour change. Will check on other corner cases also but we are trying to know the behaviour of bulk load because we are evaluating if it can be used for MOB design. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11657) Put HTable region methods in an interface
[ https://issues.apache.org/jira/browse/HBASE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101086#comment-14101086 ] Enis Soztutar commented on HBASE-11657: --- I've made HRL private in an earlier patch that introduced RegionLocations class. The idea was to make regions transparent to users because they can change, not specific to HRL per se. With the introduction of RegionLocations and HBASE-10070 work, there can be more than one location for a region together with different replica_ids associated with regions. I did not want to expose those as the public API, but we can revisit that decision if we want. Put HTable region methods in an interface - Key: HBASE-11657 URL: https://issues.apache.org/jira/browse/HBASE-11657 Project: HBase Issue Type: Improvement Affects Versions: 0.99.0 Reporter: Carter Assignee: Carter Fix For: 0.99.0 Attachments: HBASE_11657.patch, HBASE_11657_v2.patch, HBASE_11657_v3.patch, HBASE_11657_v3.patch, HBASE_11657_v4.patch Most of the HTable methods are now abstracted by HTableInterface, with the notable exception of the following methods that pertain to region metadata: {code} HRegionLocation getRegionLocation(final String row) HRegionLocation getRegionLocation(final byte [] row) HRegionLocation getRegionLocation(final byte [] row, boolean reload) byte [][] getStartKeys() byte[][] getEndKeys() Pairbyte[][],byte[][] getStartEndKeys() void clearRegionCache() {code} and a default scope method which maybe should be bundled with the others: {code} ListRegionLocations listRegionLocations() {code} Since the consensus seems to be that these would muddy HTableInterface with non-core functionality, where should it go? MapReduce looks up the region boundaries, so it needs to be exposed somewhere. Let me throw out a straw man to start the conversation. I propose: {code} org.apache.hadoop.hbase.client.HRegionInterface {code} Have HTable implement this interface. Also add these methods to HConnection: {code} HRegionInterface getTableRegion(TableName tableName) HRegionInterface getTableRegion(TableName tableName, ExecutorService pool) {code} [~stack], [~ndimiduk], [~enis], thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11753) Document HBASE_SHELL_OPTS environment variable
[ https://issues.apache.org/jira/browse/HBASE-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101093#comment-14101093 ] Jonathan Hsieh commented on HBASE-11753: lgtm. +1 Document HBASE_SHELL_OPTS environment variable -- Key: HBASE-11753 URL: https://issues.apache.org/jira/browse/HBASE-11753 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Fix For: 0.99.0, 0.96.3, 0.98.5, 0.94.22, 2.0.0 Attachments: HBASE-11753.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11753) Document HBASE_SHELL_OPTS environment variable
[ https://issues.apache.org/jira/browse/HBASE-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-11753: --- Resolution: Fixed Fix Version/s: (was: 0.94.22) (was: 0.98.5) (was: 0.96.3) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the docs updates to branch-1 and trunk since those will be the long lived docs branches. Document HBASE_SHELL_OPTS environment variable -- Key: HBASE-11753 URL: https://issues.apache.org/jira/browse/HBASE-11753 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11753.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11757) Provide a common base abstract class for both RegionObserver and MasterObserver
[ https://issues.apache.org/jira/browse/HBASE-11757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-11757: Resolution: Fixed Fix Version/s: (was: 0.99.0) 1.0.0 Status: Resolved (was: Patch Available) Provide a common base abstract class for both RegionObserver and MasterObserver --- Key: HBASE-11757 URL: https://issues.apache.org/jira/browse/HBASE-11757 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Matteo Bertozzi Fix For: 1.0.0, 2.0.0, 0.98.6 Attachments: HBASE-11757-0.98-v0.patch, HBASE-11757-v0.patch Some security coprocessors extend both RegionObserver and MasterObserver, unfortunately only one of the two can use the available base abstract class implementations. Provide a common base abstract class for both the RegionObserver and MasterObserver interfaces. Update current coprocessors that extend both interfaces to use the new common base abstract class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11550) Custom value for BUCKET_CACHE_BUCKETS_KEY should be sorted
[ https://issues.apache.org/jira/browse/HBASE-11550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101124#comment-14101124 ] stack commented on HBASE-11550: --- bq. I don't think it fair to the new contributor to have their contributions get caught up in the project politicking. [~gustavoanatoly] To be clear, ill-defined JIRAs consume the attention of those who are trying to follow along. Lack of clarity in the definition requires we need to keep an active eye out. When the issue is trivial, this is particularly irksome. This issue is a good example. It starts out without provenance -- does the issue come of 'code-reading', testing?, a user reported issue?, an attempt at setting bucket sizes in configs -- and it has 'shoulds' and 'supposed to' in subject and original description but there is no justification as to why. Nick, a third party altogether, has to do detective work to elicit there is an actual problem here. For another example, see the follow-on, filed again by Ted assigned to you, HBASE-11743. Look at it. It says this issue, HBASE-11550, makes it ...such that there is no wastage in bucket allocation. But Nick resolves this issue with the comment that your patch ensures default and user-supplied config align punting on the wastage question... to he 'guesses' HBASE-11743. This is lack of alignment here. The mess that is this issue looks like it is to repeat over in HBASE-11743. To avoid any crossfire in the future, I'd suggest file your own issues especially if you are trying to build yourself a bit of a track record. Also work on non-trivial issues as said before. You will find it easier getting reviewers if the issue is non-trivial. Custom value for BUCKET_CACHE_BUCKETS_KEY should be sorted -- Key: HBASE-11550 URL: https://issues.apache.org/jira/browse/HBASE-11550 Project: HBase Issue Type: Bug Affects Versions: 0.99.0, 0.98.4, 0.98.5 Reporter: Ted Yu Assignee: Gustavo Anatoly Priority: Trivial Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11550-v1.patch, HBASE-11550-v2.patch, HBASE-11550-v3.patch, HBASE-11550-v4-0.98.patch, HBASE-11550-v4.patch, HBASE-11550.patch User can pass bucket sizes through hbase.bucketcache.bucket.sizes config entry. The sizes are supposed to be in increasing order. Validation should be added in CacheConfig#getL2(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10092) Move up on to log4j2
[ https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101127#comment-14101127 ] stack commented on HBASE-10092: --- bq. What about putting in a log configuration file adapter so we don't have to change our log4j properties files until later? That'd make it palatable. Move up on to log4j2 Key: HBASE-10092 URL: https://issues.apache.org/jira/browse/HBASE-10092 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: Alex Newman Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch Allows logging with less friction. See http://logging.apache.org/log4j/2.x/ This rather radical transition can be done w/ minor change given they have an adapter for apache's logging, the one we use. They also have and adapter for slf4j so we likely can remove at least some of the 4 versions of this module our dependencies make use of. I made a start in attached patch but am currently stuck in maven dependency resolve hell courtesy of our slf4j. Fixing will take some concentration and a good net connection, an item I currently lack. Other TODOs are that will need to fix our little log level setting jsp page -- will likely have to undo our use of hadoop's tool here -- and the config system changes a little. I will return to this project soon. Will bring numbers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11728) Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101138#comment-14101138 ] Hudson commented on HBASE-11728: SUCCESS: Integrated in HBase-0.98 #455 (See [https://builds.apache.org/job/HBase-0.98/455/]) HBASE-11728 - Data loss while scanning using PREFIX_TREE (ramkrishna: rev e07cf3554d628bb061aa51b9b83fd81783463e1d) * hbase-prefix-tree/src/main/java/org/apache/hadoop/hbase/codec/prefixtree/decode/PrefixTreeArrayScanner.java * hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestPrefixTree.java * hbase-prefix-tree/src/main/java/org/apache/hadoop/hbase/codec/prefixtree/PrefixTreeSeeker.java Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.96.1.1, 0.98.4 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11512) Write region open/close events to WAL
[ https://issues.apache.org/jira/browse/HBASE-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-11512: -- Attachment: hbase-11512_v3.patch v3 patch from RB. Write region open/close events to WAL - Key: HBASE-11512 URL: https://issues.apache.org/jira/browse/HBASE-11512 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-11512_v1.patch, hbase-11512_v2.patch, hbase-11512_v3.patch Similar to writing flush events to WAL (HBASE-11511) and compaction events to WAL (HBASE-2231), we should write region open and close events to WAL. This is especially important for secondary region replicas, since we can use this information to pick up primary regions' files from secondary replicas. However, we may need this for regular inter cluster replication as well, see issues HBASE-10343 and HBASE-9465. A design doc for secondary replica replication can be found at HBASE-11183. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-4920) We need a mascot, a totem
[ https://issues.apache.org/jira/browse/HBASE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101149#comment-14101149 ] stack commented on HBASE-4920: -- I like the #2 and #3. I think they are fine as they are as suggestions. We can tidy up later. bq. We're just down to deciding on an embodiment for the logo. I'd like to suggest voting on the representation of the orca only, not an 'apache hbase' + orca combination. Trying to vote on the latter will have us in the weeds: No, it should be on the left!, nN, on top If we can decide on the orca representation, this will move us another step on. It would allow us deploy the representation now. Work on how the two are combined can come later. It will also vary with context (orca above, orca to the side, orca big or orca small). We need a mascot, a totem - Key: HBASE-4920 URL: https://issues.apache.org/jira/browse/HBASE-4920 Project: HBase Issue Type: Task Reporter: stack Attachments: Apache_HBase_Orca_Logo_1.jpg, Apache_HBase_Orca_Logo_Mean_version-3.pdf, Apache_HBase_Orca_Logo_Mean_version-4.pdf, Apache_HBase_Orca_Logo_round5.pdf, HBase Orca Logo.jpg, Orca_479990801.jpg, Screen shot 2011-11-30 at 4.06.17 PM.png, apache hbase orca logo_Proof 3.pdf, apache logo_Proof 8.pdf, jumping-orca_rotated.xcf, jumping-orca_rotated_right.png, krake.zip, more_orcas.png, more_orcas2.png, orca_clipart_freevector_lhs.jpeg, orca_free_vector_on_top_66percent_levelled.png, orca_free_vector_sheared_rotated_rhs.png, orca_free_vector_some_selections.png, photo (2).JPG, plus_orca.png, proposal_1_logo.png, proposal_1_logo.xcf, proposal_2_logo.png, proposal_2_logo.xcf, proposal_3_logo.png, proposal_3_logo.xcf We need a totem for our t-shirt that is yet to be printed. O'Reilly owns the Clyesdale. We need something else. We could have a fluffy little duck that quacks 'hbase!' when you squeeze it and we could order boxes of them from some off-shore sweatshop that subcontracts to a contractor who employs child labor only. Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from Salesforce showed me, that was a bit too spiritual for me to be seen quoting here, it had the Orca as the 'Guardian of the Cosmic Memory': i.e. in translation, bigdata). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-4920) We need a mascot, a totem
[ https://issues.apache.org/jira/browse/HBASE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101154#comment-14101154 ] Andrew Purtell commented on HBASE-4920: --- bq. If we can decide on the orca representation, this will move us another step on. It would allow us deploy the representation now. JMs proposals all have the same stylized Orca representation. lgtm, +1 Let's move forward. As you say, we can tweak the positioning later. We need a mascot, a totem - Key: HBASE-4920 URL: https://issues.apache.org/jira/browse/HBASE-4920 Project: HBase Issue Type: Task Reporter: stack Attachments: Apache_HBase_Orca_Logo_1.jpg, Apache_HBase_Orca_Logo_Mean_version-3.pdf, Apache_HBase_Orca_Logo_Mean_version-4.pdf, Apache_HBase_Orca_Logo_round5.pdf, HBase Orca Logo.jpg, Orca_479990801.jpg, Screen shot 2011-11-30 at 4.06.17 PM.png, apache hbase orca logo_Proof 3.pdf, apache logo_Proof 8.pdf, jumping-orca_rotated.xcf, jumping-orca_rotated_right.png, krake.zip, more_orcas.png, more_orcas2.png, orca_clipart_freevector_lhs.jpeg, orca_free_vector_on_top_66percent_levelled.png, orca_free_vector_sheared_rotated_rhs.png, orca_free_vector_some_selections.png, photo (2).JPG, plus_orca.png, proposal_1_logo.png, proposal_1_logo.xcf, proposal_2_logo.png, proposal_2_logo.xcf, proposal_3_logo.png, proposal_3_logo.xcf We need a totem for our t-shirt that is yet to be printed. O'Reilly owns the Clyesdale. We need something else. We could have a fluffy little duck that quacks 'hbase!' when you squeeze it and we could order boxes of them from some off-shore sweatshop that subcontracts to a contractor who employs child labor only. Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from Salesforce showed me, that was a bit too spiritual for me to be seen quoting here, it had the Orca as the 'Guardian of the Cosmic Memory': i.e. in translation, bigdata). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11762) Record the class name of Codec in WAL header
[ https://issues.apache.org/jira/browse/HBASE-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-11762: --- Attachment: 11762-v5.txt Patch v5 addresses Enis' comment about WALCellCodec.create() method Record the class name of Codec in WAL header Key: HBASE-11762 URL: https://issues.apache.org/jira/browse/HBASE-11762 Project: HBase Issue Type: Task Components: wal Reporter: Ted Yu Assignee: Ted Yu Fix For: 1.0.0, 2.0.0, 0.98.6 Attachments: 11762-v1.txt, 11762-v2.txt, 11762-v4.txt, 11762-v5.txt In follow-up discussion to HBASE-11620, Enis brought up this point: Related to this, should not we also write the CellCodec that we use in the WAL header. Right now, the codec comes from the configuration which means that you cannot read back the WAL files if you change the codec. This JIRA is to implement the above suggestion. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-4920) We need a mascot, a totem
[ https://issues.apache.org/jira/browse/HBASE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101165#comment-14101165 ] Jean-Marc Spaggiari commented on HBASE-4920: Perfect then. Let's start a vote on the mailing list. 2 options. [ ] You are fine with an Orca [ ] You are not fine with an Orca User mailing list? Or dev only? We need a mascot, a totem - Key: HBASE-4920 URL: https://issues.apache.org/jira/browse/HBASE-4920 Project: HBase Issue Type: Task Reporter: stack Attachments: Apache_HBase_Orca_Logo_1.jpg, Apache_HBase_Orca_Logo_Mean_version-3.pdf, Apache_HBase_Orca_Logo_Mean_version-4.pdf, Apache_HBase_Orca_Logo_round5.pdf, HBase Orca Logo.jpg, Orca_479990801.jpg, Screen shot 2011-11-30 at 4.06.17 PM.png, apache hbase orca logo_Proof 3.pdf, apache logo_Proof 8.pdf, jumping-orca_rotated.xcf, jumping-orca_rotated_right.png, krake.zip, more_orcas.png, more_orcas2.png, orca_clipart_freevector_lhs.jpeg, orca_free_vector_on_top_66percent_levelled.png, orca_free_vector_sheared_rotated_rhs.png, orca_free_vector_some_selections.png, photo (2).JPG, plus_orca.png, proposal_1_logo.png, proposal_1_logo.xcf, proposal_2_logo.png, proposal_2_logo.xcf, proposal_3_logo.png, proposal_3_logo.xcf We need a totem for our t-shirt that is yet to be printed. O'Reilly owns the Clyesdale. We need something else. We could have a fluffy little duck that quacks 'hbase!' when you squeeze it and we could order boxes of them from some off-shore sweatshop that subcontracts to a contractor who employs child labor only. Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from Salesforce showed me, that was a bit too spiritual for me to be seen quoting here, it had the Orca as the 'Guardian of the Cosmic Memory': i.e. in translation, bigdata). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11657) Put HTable region methods in an interface
[ https://issues.apache.org/jira/browse/HBASE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101189#comment-14101189 ] Carter commented on HBASE-11657: I mainly wanted to make sure that the same person who made HRL private is okay with whatever we come up with for this interface. Since you are one and the same person, I am less concerned. ;-) I double-checked and there is actually still a problem with the (byte[]/byte[]) - listservernames. In short TabletInputFormatBase wants the following from the HTable that is being passed to it: # The hostname and port of the regionserver for a row (handled by ServerName) # The name of the table (we can add getTableName to RegionLocator) # The region name itself, which it uses to lookup the region size in the RegionSizeCalculator (handled by HRegionInfo) I see the following alternatives: * Make HRL public. It contains ServerName and HRegionInfo, which are both required by the current implementation of TableInputFormatBase. * Return ServerName and region name in some new POJO * Find a new way to do what TableInputFormatBase wants to accomplish Sorry to open up this can of worms, but that's part of the fun of retrofitting an interface. Put HTable region methods in an interface - Key: HBASE-11657 URL: https://issues.apache.org/jira/browse/HBASE-11657 Project: HBase Issue Type: Improvement Affects Versions: 0.99.0 Reporter: Carter Assignee: Carter Fix For: 0.99.0 Attachments: HBASE_11657.patch, HBASE_11657_v2.patch, HBASE_11657_v3.patch, HBASE_11657_v3.patch, HBASE_11657_v4.patch Most of the HTable methods are now abstracted by HTableInterface, with the notable exception of the following methods that pertain to region metadata: {code} HRegionLocation getRegionLocation(final String row) HRegionLocation getRegionLocation(final byte [] row) HRegionLocation getRegionLocation(final byte [] row, boolean reload) byte [][] getStartKeys() byte[][] getEndKeys() Pairbyte[][],byte[][] getStartEndKeys() void clearRegionCache() {code} and a default scope method which maybe should be bundled with the others: {code} ListRegionLocations listRegionLocations() {code} Since the consensus seems to be that these would muddy HTableInterface with non-core functionality, where should it go? MapReduce looks up the region boundaries, so it needs to be exposed somewhere. Let me throw out a straw man to start the conversation. I propose: {code} org.apache.hadoop.hbase.client.HRegionInterface {code} Have HTable implement this interface. Also add these methods to HConnection: {code} HRegionInterface getTableRegion(TableName tableName) HRegionInterface getTableRegion(TableName tableName, ExecutorService pool) {code} [~stack], [~ndimiduk], [~enis], thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11772) Bulk load mvcc and seqId issues with native hfiles
Jerry He created HBASE-11772: Summary: Bulk load mvcc and seqId issues with native hfiles Key: HBASE-11772 URL: https://issues.apache.org/jira/browse/HBASE-11772 Project: HBase Issue Type: Bug Affects Versions: 0.98.5 Reporter: Jerry He There are mvcc and seqId issues when bulk load native hfiles -- meaning hfiles that are direct file copy-out from hbase, not from HFileOutputFormat job. There are differences between these two types of hfiles. Native hfiles have possible non-zero MAX_MEMSTORE_TS_KEY value and non-zero mvcc values in cells. Native hfiles also have MAX_SEQ_ID_KEY. Native hfiles do not have BULKLOAD_TIME_KEY. Here are a couple of problems I observed when bulk load native hfiles. 1. Cells in newly bulk loaded hfiles can be invisible to scan. It is easy to re-create. Bulk load a native hfile that has a larger mvcc value in cells, e.g 10 If the current readpoint when initiating a scan is less than 10, the cells in the new hfile are skipped, thus become invisible. We don't reset the readpoint of a region after bulk load. 2. The current StoreFile.isBulkLoadResult() is implemented as: {code} return metadataMap.containsKey(BULKLOAD_TIME_KEY) {code} which does not detect bulkloaded native hfiles. 3. Another observed problem is possible data loss during log recovery. It is similar to HBASE-10958 reported by [~jdcryans]. Borrow the re-create steps from HBASE-10958. 1) Create an empty table 2) Put one row in it (let's say it gets seqid 1) 3) Bulk load one native hfile with large seqId ( e.g. 100). The native hfile can be obtained by copying out from existing table. 4) Kill the region server that holds the table's region. Scan the table once the region is made available again. The first row, at seqid 1, will be missing since the HFile with seqid 100 makes us believe that everything that came before it was flushed. The problem 3 is probably related to 2. We will be ok if we get the appended seqId during bulk load instead of 100 from inside the file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11772) Bulk load mvcc and seqId issues with native hfiles
[ https://issues.apache.org/jira/browse/HBASE-11772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101202#comment-14101202 ] Jerry He commented on HBASE-11772: -- The issues were observed in the 0.98 stream. There are changed in the master branch, e.g. HBASE-8763 combine mvcc and seqId. But I suspect the issues still exist there. Bulk load mvcc and seqId issues with native hfiles -- Key: HBASE-11772 URL: https://issues.apache.org/jira/browse/HBASE-11772 Project: HBase Issue Type: Bug Affects Versions: 0.98.5 Reporter: Jerry He There are mvcc and seqId issues when bulk load native hfiles -- meaning hfiles that are direct file copy-out from hbase, not from HFileOutputFormat job. There are differences between these two types of hfiles. Native hfiles have possible non-zero MAX_MEMSTORE_TS_KEY value and non-zero mvcc values in cells. Native hfiles also have MAX_SEQ_ID_KEY. Native hfiles do not have BULKLOAD_TIME_KEY. Here are a couple of problems I observed when bulk load native hfiles. 1. Cells in newly bulk loaded hfiles can be invisible to scan. It is easy to re-create. Bulk load a native hfile that has a larger mvcc value in cells, e.g 10 If the current readpoint when initiating a scan is less than 10, the cells in the new hfile are skipped, thus become invisible. We don't reset the readpoint of a region after bulk load. 2. The current StoreFile.isBulkLoadResult() is implemented as: {code} return metadataMap.containsKey(BULKLOAD_TIME_KEY) {code} which does not detect bulkloaded native hfiles. 3. Another observed problem is possible data loss during log recovery. It is similar to HBASE-10958 reported by [~jdcryans]. Borrow the re-create steps from HBASE-10958. 1) Create an empty table 2) Put one row in it (let's say it gets seqid 1) 3) Bulk load one native hfile with large seqId ( e.g. 100). The native hfile can be obtained by copying out from existing table. 4) Kill the region server that holds the table's region. Scan the table once the region is made available again. The first row, at seqid 1, will be missing since the HFile with seqid 100 makes us believe that everything that came before it was flushed. The problem 3 is probably related to 2. We will be ok if we get the appended seqId during bulk load instead of 100 from inside the file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11728) Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING
[ https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101209#comment-14101209 ] Hudson commented on HBASE-11728: FAILURE: Integrated in HBase-1.0 #108 (See [https://builds.apache.org/job/HBase-1.0/108/]) HBASE-11728 - Data loss while scanning using PREFIX_TREE (ramkrishna: rev f8eb1962dc9e92122d00cccfede819014a1cc8f6) * hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestPrefixTree.java * hbase-prefix-tree/src/main/java/org/apache/hadoop/hbase/codec/prefixtree/decode/PrefixTreeArrayScanner.java * hbase-prefix-tree/src/main/java/org/apache/hadoop/hbase/codec/prefixtree/PrefixTreeSeeker.java Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING -- Key: HBASE-11728 URL: https://issues.apache.org/jira/browse/HBASE-11728 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.96.1.1, 0.98.4 Environment: ubuntu12 hadoop-2.2.0 Hbase-0.96.1.1 SUN-JDK(1.7.0_06-b24) Reporter: wuchengzhi Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java Original Estimate: 72h Remaining Estimate: 72h In Scan case, i prepare some data as beflow: Table Desc (Using the prefix-tree encoding) : 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', TTL = '15552000'} and i put 5 rows as: (RowKey , Qualifier, Value) 'a-b-0-0', 'qf_1', 'c1-value' 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3' so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the corret result: Test 1: Scan scan = new Scan(); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- 'a-b-A-1', 'qf_1', 'c1-value' 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' and then i try next , scan to addColumn Test2: Scan scan = new Scan(); scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_2)); scan.setStartRow(a-b-A-1.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nonthing. Then i update the addColumn for scan.addColumn(Bytes.toBytes(cf_1) , Bytes.toBytes(qf_1)); and i got the expected result 'a-b-A-1', 'qf_1', 'c1-value' as well. then i do more testing... i update the case to modify the startRow greater than the 'a-b-A-1' Test3: Scan scan = new Scan(); scan.setStartRow(a-b-A-1-.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); -- except: 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value' 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' but actually i got nothing again. i modify the start row greater than 'a-b-A-1-1402329600-1402396277' Scan scan = new Scan(); scan.setStartRow(a-b-A-1-140239.getBytes()); scan.setStopRow(a-b-A-1:.getBytes()); and i got the expect row as well: 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2' So, i think it may be a bug in the prefix-tree encoding.It happens after the data flush to the storefile, and it's ok when the data in mem-store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11772) Bulk load mvcc and seqId issues with native hfiles
[ https://issues.apache.org/jira/browse/HBASE-11772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101214#comment-14101214 ] Jerry He commented on HBASE-11772: -- Here is the proposed fix: 1) Better detection of bulk loaded files. We can use the loaded file name with '_SeqId_' since we already use it as marker to get the load time seqId. 2) Regard bulk loaded files always have mvcc 0. Don't call StoreFileScanner.skipKVsNewerThanReadpoint() during scan if it is bulk loaded file no matter whether or not it has mvcc in the file. 3) The problem 3 will probably be fixed by 1). Bulk load mvcc and seqId issues with native hfiles -- Key: HBASE-11772 URL: https://issues.apache.org/jira/browse/HBASE-11772 Project: HBase Issue Type: Bug Affects Versions: 0.98.5 Reporter: Jerry He There are mvcc and seqId issues when bulk load native hfiles -- meaning hfiles that are direct file copy-out from hbase, not from HFileOutputFormat job. There are differences between these two types of hfiles. Native hfiles have possible non-zero MAX_MEMSTORE_TS_KEY value and non-zero mvcc values in cells. Native hfiles also have MAX_SEQ_ID_KEY. Native hfiles do not have BULKLOAD_TIME_KEY. Here are a couple of problems I observed when bulk load native hfiles. 1. Cells in newly bulk loaded hfiles can be invisible to scan. It is easy to re-create. Bulk load a native hfile that has a larger mvcc value in cells, e.g 10 If the current readpoint when initiating a scan is less than 10, the cells in the new hfile are skipped, thus become invisible. We don't reset the readpoint of a region after bulk load. 2. The current StoreFile.isBulkLoadResult() is implemented as: {code} return metadataMap.containsKey(BULKLOAD_TIME_KEY) {code} which does not detect bulkloaded native hfiles. 3. Another observed problem is possible data loss during log recovery. It is similar to HBASE-10958 reported by [~jdcryans]. Borrow the re-create steps from HBASE-10958. 1) Create an empty table 2) Put one row in it (let's say it gets seqid 1) 3) Bulk load one native hfile with large seqId ( e.g. 100). The native hfile can be obtained by copying out from existing table. 4) Kill the region server that holds the table's region. Scan the table once the region is made available again. The first row, at seqid 1, will be missing since the HFile with seqid 100 makes us believe that everything that came before it was flushed. The problem 3 is probably related to 2. We will be ok if we get the appended seqId during bulk load instead of 100 from inside the file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-10092) Move up on to log4j2
[ https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-10092: Fix Version/s: 2.0.0 Move up on to log4j2 Key: HBASE-10092 URL: https://issues.apache.org/jira/browse/HBASE-10092 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: Alex Newman Fix For: 2.0.0 Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch Allows logging with less friction. See http://logging.apache.org/log4j/2.x/ This rather radical transition can be done w/ minor change given they have an adapter for apache's logging, the one we use. They also have and adapter for slf4j so we likely can remove at least some of the 4 versions of this module our dependencies make use of. I made a start in attached patch but am currently stuck in maven dependency resolve hell courtesy of our slf4j. Fixing will take some concentration and a good net connection, an item I currently lack. Other TODOs are that will need to fix our little log level setting jsp page -- will likely have to undo our use of hadoop's tool here -- and the config system changes a little. I will return to this project soon. Will bring numbers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11734) Document changed behavior of hbase.hstore.time.to.purge.deletes
[ https://issues.apache.org/jira/browse/HBASE-11734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101252#comment-14101252 ] Jonathan Hsieh commented on HBASE-11734: Thanks misty. Minor nit fix: are purge - are *purged* {quote} +descriptionThe amount of time to delay purging of delete markers with future timestamps. If + unset, or set to 0, all delete markers, including those with future timestamps, are purge + during the next major compaction. Otherwise, a delete marker is kept until the major compaction {quote} I fixed when I committed to master, and branch-1. Document changed behavior of hbase.hstore.time.to.purge.deletes --- Key: HBASE-11734 URL: https://issues.apache.org/jira/browse/HBASE-11734 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Fix For: 0.99.0, 0.98.2, 0.96.3 Attachments: HBASE-11734.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11734) Document changed behavior of hbase.hstore.time.to.purge.deletes
[ https://issues.apache.org/jira/browse/HBASE-11734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-11734: --- Attachment: hbase-11734.v2.branch1.patch hbase-11734.v2.patch I've committed the v2 versions of the patch. Document changed behavior of hbase.hstore.time.to.purge.deletes --- Key: HBASE-11734 URL: https://issues.apache.org/jira/browse/HBASE-11734 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11734.patch, hbase-11734.v2.branch1.patch, hbase-11734.v2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11734) Document changed behavior of hbase.hstore.time.to.purge.deletes
[ https://issues.apache.org/jira/browse/HBASE-11734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-11734: --- Fix Version/s: (was: 0.96.3) (was: 0.98.2) 2.0.0 Document changed behavior of hbase.hstore.time.to.purge.deletes --- Key: HBASE-11734 URL: https://issues.apache.org/jira/browse/HBASE-11734 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11734.patch, hbase-11734.v2.branch1.patch, hbase-11734.v2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11734) Document changed behavior of hbase.hstore.time.to.purge.deletes
[ https://issues.apache.org/jira/browse/HBASE-11734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-11734: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Document changed behavior of hbase.hstore.time.to.purge.deletes --- Key: HBASE-11734 URL: https://issues.apache.org/jira/browse/HBASE-11734 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Fix For: 0.99.0, 2.0.0 Attachments: HBASE-11734.patch, hbase-11734.v2.branch1.patch, hbase-11734.v2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-4920) We need a mascot, a totem
[ https://issues.apache.org/jira/browse/HBASE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101276#comment-14101276 ] stack commented on HBASE-4920: -- [~jmspaggi] We've already run the orca vote and that vote passed. See http://search-hadoop.com/m/DHED4yIYZl1 If you check out the thread it seemed to want to move naturally to the next stage, the vote on the orca representation. We need a mascot, a totem - Key: HBASE-4920 URL: https://issues.apache.org/jira/browse/HBASE-4920 Project: HBase Issue Type: Task Reporter: stack Attachments: Apache_HBase_Orca_Logo_1.jpg, Apache_HBase_Orca_Logo_Mean_version-3.pdf, Apache_HBase_Orca_Logo_Mean_version-4.pdf, Apache_HBase_Orca_Logo_round5.pdf, HBase Orca Logo.jpg, Orca_479990801.jpg, Screen shot 2011-11-30 at 4.06.17 PM.png, apache hbase orca logo_Proof 3.pdf, apache logo_Proof 8.pdf, jumping-orca_rotated.xcf, jumping-orca_rotated_right.png, krake.zip, more_orcas.png, more_orcas2.png, orca_clipart_freevector_lhs.jpeg, orca_free_vector_on_top_66percent_levelled.png, orca_free_vector_sheared_rotated_rhs.png, orca_free_vector_some_selections.png, photo (2).JPG, plus_orca.png, proposal_1_logo.png, proposal_1_logo.xcf, proposal_2_logo.png, proposal_2_logo.xcf, proposal_3_logo.png, proposal_3_logo.xcf We need a totem for our t-shirt that is yet to be printed. O'Reilly owns the Clyesdale. We need something else. We could have a fluffy little duck that quacks 'hbase!' when you squeeze it and we could order boxes of them from some off-shore sweatshop that subcontracts to a contractor who employs child labor only. Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from Salesforce showed me, that was a bit too spiritual for me to be seen quoting here, it had the Orca as the 'Guardian of the Cosmic Memory': i.e. in translation, bigdata). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-4920) We need a mascot, a totem
[ https://issues.apache.org/jira/browse/HBASE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101299#comment-14101299 ] Jean-Marc Spaggiari commented on HBASE-4920: From the above comments, sound like consensus was page 1 of the PDF. Do we want to add http://www.vectorfree.com/jumping-orca into the vote too? Or stay with page 1? Like do you agree on this orca (page 1) as a logo yes/no. Or Between those 4 orcas rate them 1 to 4 (and then we compile the results)? Is the orca on page 1 of the PDF free of rights? We need a mascot, a totem - Key: HBASE-4920 URL: https://issues.apache.org/jira/browse/HBASE-4920 Project: HBase Issue Type: Task Reporter: stack Attachments: Apache_HBase_Orca_Logo_1.jpg, Apache_HBase_Orca_Logo_Mean_version-3.pdf, Apache_HBase_Orca_Logo_Mean_version-4.pdf, Apache_HBase_Orca_Logo_round5.pdf, HBase Orca Logo.jpg, Orca_479990801.jpg, Screen shot 2011-11-30 at 4.06.17 PM.png, apache hbase orca logo_Proof 3.pdf, apache logo_Proof 8.pdf, jumping-orca_rotated.xcf, jumping-orca_rotated_right.png, krake.zip, more_orcas.png, more_orcas2.png, orca_clipart_freevector_lhs.jpeg, orca_free_vector_on_top_66percent_levelled.png, orca_free_vector_sheared_rotated_rhs.png, orca_free_vector_some_selections.png, photo (2).JPG, plus_orca.png, proposal_1_logo.png, proposal_1_logo.xcf, proposal_2_logo.png, proposal_2_logo.xcf, proposal_3_logo.png, proposal_3_logo.xcf We need a totem for our t-shirt that is yet to be printed. O'Reilly owns the Clyesdale. We need something else. We could have a fluffy little duck that quacks 'hbase!' when you squeeze it and we could order boxes of them from some off-shore sweatshop that subcontracts to a contractor who employs child labor only. Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from Salesforce showed me, that was a bit too spiritual for me to be seen quoting here, it had the Orca as the 'Guardian of the Cosmic Memory': i.e. in translation, bigdata). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11773) Wrong field used for protobuf construction in RegionStates.
Andrey Stepachev created HBASE-11773: Summary: Wrong field used for protobuf construction in RegionStates. Key: HBASE-11773 URL: https://issues.apache.org/jira/browse/HBASE-11773 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Andrey Stepachev Assignee: Andrey Stepachev Protobuf Java Pojo converter uses wrong field for converted enum construction (actually default value of protobuf message used). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11682) Explain hotspotting
[ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101308#comment-14101308 ] Jonathan Hsieh commented on HBASE-11682: {code} + paraSalting in this sense has nothing to do with cryptography, but refers to adding random +data to the start of a row key. In this case, salting refers to adding a prefix to the row +key to cause it to sort differently than it otherwise would. Salting can be helpful if you +have a few keys that come up over and over, along with other rows that don't fit those keys. +In that case, the regions holding rows with the hot keys would be overloaded, compared to +the other regions. Salting completely removes ordering, so is often a poorer choice than +hashing. Using totally random row keys for data which is accessed sequentially would remove +the benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or +scan would need to query all regions./para {code} I don't think this salting example is correct about the ramifications. Both Nick and I agree that salting is puting some random value in front of the actual value. This means instead of one sorted list of entries, we'd have many n sorted lists of entries if the cardinality of the salt is n. Example: naively we have rowkeys like this: foo0001 foo0002 foo0003 foo0004 if we us a 4 way salt (a,b,c,d), we could end up with data resorted like this: a-foo0003 b-foo0001 c-foo0004 d-foo0002 Let say we add some new values to row foo0003. It could get salted with a new salt, let's say 'c'. a-foo0003 b-foo0001 *c-foo0003* c-foo0004 d-foo0002 To read we still could get things read in the original order but we'd have to have a reader starting from each salt in parallel to get the rows back in order. (and likely need to do some coalescing of foo0003 to combine the a-foo0003 and c-foo0003 rows back into one. The effect here in this situtation is that we could be writing with 4x the throughput now since we would be on 4 different machines.(assuming that the a, b, c, d are balanced onto different machines). Nick's point of view (please correct me if I am wrong) says that you could salt the original row key with a one-way hash so that foo0003 would always get salted with 'a'. This would spread rowkeys that are lexicographically close (foo0001 and foo0002) to different machines that could help reduce contention and increase overall throughput but not allow ever allow a single row to have 4x the throughput like the other approach. {code} + paraHashing refers to applying a random one-way function to the row key, such that a +particular row always gets the same arbitrary value applied. This preserves the sort order +so that scans are effective, but spreads out load across a region. One example where hashing +is the right strategy would be if for some reason, a large proportion of rows started with +the same letter. Normally, these would all be sorted into the same region. You can apply a +hash to artificially differentiate them and spread them out./para {code} Hashing actually totally trashes the sort order -- in fact the goal of hashing is to evenly disburse entries that are near each other lexicographically as much as possible. Explain hotspotting --- Key: HBASE-11682 URL: https://issues.apache.org/jira/browse/HBASE-11682 Project: HBase Issue Type: Task Components: documentation Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Attachments: HBASE-11682-1.patch, HBASE-11682.patch, HBASE-11682.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11773) Wrong field used for protobuf construction in RegionStates.
[ https://issues.apache.org/jira/browse/HBASE-11773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Stepachev updated HBASE-11773: - Affects Version/s: 2.0.0 1.0.0 Wrong field used for protobuf construction in RegionStates. --- Key: HBASE-11773 URL: https://issues.apache.org/jira/browse/HBASE-11773 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 1.0.0, 2.0.0 Reporter: Andrey Stepachev Assignee: Andrey Stepachev Protobuf Java Pojo converter uses wrong field for converted enum construction (actually default value of protobuf message used). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11739) Document blockCache contents report in the UI
[ https://issues.apache.org/jira/browse/HBASE-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11739: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Nice. Committed. Document blockCache contents report in the UI - Key: HBASE-11739 URL: https://issues.apache.org/jira/browse/HBASE-11739 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Fix For: 0.99.0 Attachments: HBASE-11739.patch, bc_basic.png, bc_basic.png, bc_config.png, bc_l1.png, bc_l2_buckets.png, bc_stats.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101323#comment-14101323 ] stack commented on HBASE-11747: --- Good one. Every RS sending 100MB of 'status' to the master every second or so is just obnoxious, especially so when much of this info is being duplicated no our metrics 'channel'. Thanks for bringing this one up Virag. We need a bit of fixup in here. ClusterStatus is too bulky --- Key: HBASE-11747 URL: https://issues.apache.org/jira/browse/HBASE-11747 Project: HBase Issue Type: Sub-task Reporter: Virag Kothari Attachments: exceptiontrace Following exception on 0.98 with 1M regions on cluster with 160 region servers {code} Caused by: java.io.IOException: Call to regionserverhost:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) at org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ... 43 more Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11773) Wrong field used for protobuf construction in RegionStates.
[ https://issues.apache.org/jira/browse/HBASE-11773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Stepachev updated HBASE-11773: - Attachment: HBASE-11773.patch Wrong field used for protobuf construction in RegionStates. --- Key: HBASE-11773 URL: https://issues.apache.org/jira/browse/HBASE-11773 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 1.0.0, 2.0.0 Reporter: Andrey Stepachev Assignee: Andrey Stepachev Attachments: HBASE-11773.patch Protobuf Java Pojo converter uses wrong field for converted enum construction (actually default value of protobuf message used). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11232) Region fail to release the updatelock for illegal CF in multi row mutations
[ https://issues.apache.org/jira/browse/HBASE-11232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11232: -- Fix Version/s: 0.94.23 Region fail to release the updatelock for illegal CF in multi row mutations --- Key: HBASE-11232 URL: https://issues.apache.org/jira/browse/HBASE-11232 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.19 Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.94.23 Attachments: HBASE-11232-0.94.diff The failback code in processRowsWithLocks did not check the column family. If there is an illegal CF in the muation, it will throw NullPointException and the update lock will not be released. So the region can not be flushed and compacted. HRegion #4946 {code} if (!mutations.isEmpty() !walSyncSuccessful) { LOG.warn(Wal sync failed. Roll back + mutations.size() + memstore keyvalues for row(s): + processor.getRowsToLock().iterator().next() + ...); for (KeyValue kv : mutations) { stores.get(kv.getFamily()).rollback(kv); } } // 11. Roll mvcc forward if (writeEntry != null) { mvcc.completeMemstoreInsert(writeEntry); writeEntry = null; } if (locked) { this.updatesLock.readLock().unlock(); locked = false; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11773) Wrong field used for protobuf construction in RegionStates.
[ https://issues.apache.org/jira/browse/HBASE-11773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Stepachev updated HBASE-11773: - Status: Patch Available (was: Open) Wrong field used for protobuf construction in RegionStates. --- Key: HBASE-11773 URL: https://issues.apache.org/jira/browse/HBASE-11773 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 1.0.0, 2.0.0 Reporter: Andrey Stepachev Assignee: Andrey Stepachev Attachments: HBASE-11773.patch Protobuf Java Pojo converter uses wrong field for converted enum construction (actually default value of protobuf message used). -- This message was sent by Atlassian JIRA (v6.2#6252)