[jira] [Updated] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5869: - Attachment: v4.txt v4 I still have to convert the tests. This is taking a while. We serialize and deserialize from zk all over our codebase. Takes a while to chase down all cases. Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb - Key: HBASE-5869 URL: https://issues.apache.org/jira/browse/HBASE-5869 Project: HBase Issue Type: Task Reporter: stack Attachments: firstcut.txt, secondcut.txt, v4.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor
ramkrishna.s.vasudevan created HBASE-5882: - Summary: Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor Key: HBASE-5882 URL: https://issues.apache.org/jira/browse/HBASE-5882 Project: HBase Issue Type: Improvement Affects Versions: 0.92.1, 0.90.6 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7, 0.96.0, 0.94.1 Currently on master restart if it tries to do processRIT, any region if found on dead server tries to avoid the nwe assignment so that timeout monitor can take care. This case is more prominent if the node is found in RS_ZK_REGION_OPENING state. I think we can handle this by triggering a new assignment with a new plan. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5864) Error while reading from hfile in 0.94
[ https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262400#comment-13262400 ] Lars Hofhansl commented on HBASE-5864: -- @Ram: You are right it is a small change. Just wondering whether we actually need the part that changes public DataInputStream nextBlockAsStream(BlockType blockType) to public HFileBlock nextBlockWithBlockType(BlockType blockType). Error while reading from hfile in 0.94 -- Key: HBASE-5864 URL: https://issues.apache.org/jira/browse/HBASE-5864 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5864_1.patch, HBASE-5864_2.patch, HBASE-5864_3.patch, HBASE-5864_test.patch Got the following stacktrace during region split. {noformat} 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638) at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5864) Error while reading from hfile in 0.94
[ https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262401#comment-13262401 ] Lars Hofhansl commented on HBASE-5864: -- Ah OK. Never mind, you need the Block to the checksumBytes. Error while reading from hfile in 0.94 -- Key: HBASE-5864 URL: https://issues.apache.org/jira/browse/HBASE-5864 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5864_1.patch, HBASE-5864_2.patch, HBASE-5864_3.patch, HBASE-5864_test.patch Got the following stacktrace during region split. {noformat} 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638) at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5864) Error while reading from hfile in 0.94
[ https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262401#comment-13262401 ] Lars Hofhansl edited comment on HBASE-5864 at 4/26/12 6:18 AM: --- Ah OK. Never mind, you need the Block to get the checksumBytes. was (Author: lhofhansl): Ah OK. Never mind, you need the Block to the checksumBytes. Error while reading from hfile in 0.94 -- Key: HBASE-5864 URL: https://issues.apache.org/jira/browse/HBASE-5864 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5864_1.patch, HBASE-5864_2.patch, HBASE-5864_3.patch, HBASE-5864_test.patch Got the following stacktrace during region split. {noformat} 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638) at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5864) Error while reading from hfile in 0.94
[ https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262403#comment-13262403 ] Lars Hofhansl commented on HBASE-5864: -- I think I get the change now. +1 Error while reading from hfile in 0.94 -- Key: HBASE-5864 URL: https://issues.apache.org/jira/browse/HBASE-5864 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5864_1.patch, HBASE-5864_2.patch, HBASE-5864_3.patch, HBASE-5864_test.patch Got the following stacktrace during region split. {noformat} 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638) at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5161) Compaction algorithm should prioritize reference files
[ https://issues.apache.org/jira/browse/HBASE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262407#comment-13262407 ] ramkrishna.s.vasudevan commented on HBASE-5161: --- @Stack and @J-D We seem to end up in the same problem. We had some 32 reference files created out of which one was never selected in further compaction cycles. We even tried to stop the writes for 2 to 3 hours but still the compaction did not pick up the reference file. Will dig in more into this. Compaction algorithm should prioritize reference files -- Key: HBASE-5161 URL: https://issues.apache.org/jira/browse/HBASE-5161 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.1, 0.94.0 I got myself into a state where my table was un-splittable as long as the insert load was coming in. Emergency flushes because of the low memory barrier don't check the number of store files so it never blocks, to a point where I had in one case 45 store files and the compactions were almost never done on the reference files (had 15 of them, went down by one in 20 minutes). Since you can't split regions with reference files, that region couldn't split and was doomed to just get more store files until the load stopped. Marking this as a minor issue, what we really need is a better pushback mechanism but not prioritizing reference files seems wrong. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5161) Compaction algorithm should prioritize reference files
[ https://issues.apache.org/jira/browse/HBASE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262407#comment-13262407 ] ramkrishna.s.vasudevan edited comment on HBASE-5161 at 4/26/12 6:36 AM: @Stack and @J-D We seem to end up in the same problem. We had some 32 reference files created out of which one was never selected in further compaction cycles. We even tried to stop the writes for 2 to 3 hours but still the compaction did not pick up the reference file. The region grew upto 400G. {code} ./hbase-root-regionserver-HOST-192-168-47-205.log:2012-04-26 09:44:15,835 DEBUG org.apache.hadoop.hbase.regionserver.Store: hdfs://10.18.40.217:9000/hbase/ufdr/ce5c144a1714df08db1132238a749116/value/cde90029ecb74ef791500ccd3a1e8908.755d1cf6b960c02cc72c1dd83551df82-hdfs://10.18.40.217:9000/hbase/ufdr/755d1cf6b960c02cc72c1dd83551df82/value/cde90029ecb74ef791500ccd3a1e8908-top is not splittable {code} We get the above logs for almost 2 to 3 hours. The pair of this reference file (its bottom) is also not compacted. Will dig in more to find any other reason for not getting picked up. was (Author: ram_krish): @Stack and @J-D We seem to end up in the same problem. We had some 32 reference files created out of which one was never selected in further compaction cycles. We even tried to stop the writes for 2 to 3 hours but still the compaction did not pick up the reference file. Will dig in more into this. Compaction algorithm should prioritize reference files -- Key: HBASE-5161 URL: https://issues.apache.org/jira/browse/HBASE-5161 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.1, 0.94.0 I got myself into a state where my table was un-splittable as long as the insert load was coming in. Emergency flushes because of the low memory barrier don't check the number of store files so it never blocks, to a point where I had in one case 45 store files and the compactions were almost never done on the reference files (had 15 of them, went down by one in 20 minutes). Since you can't split regions with reference files, that region couldn't split and was doomed to just get more store files until the load stopped. Marking this as a minor issue, what we really need is a better pushback mechanism but not prioritizing reference files seems wrong. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262456#comment-13262456 ] jirapos...@reviews.apache.org commented on HBASE-2214: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4726/ --- (Updated 2012-04-26 08:18:40.576784) Review request for hbase and Ted Yu. Changes --- Uploaded v5 of patch. The test still works. Summary --- HBASE-2214 per scan max buffersize. This addresses bug HBASE-2214. https://issues.apache.org/jira/browse/HBASE-2214 Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java 1330680 /src/main/java/org/apache/hadoop/hbase/client/Scan.java 1330680 /src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java 1330680 /src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java 1330680 /src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java 1330680 /src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java 1330680 /src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java 1330680 /src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java 1330680 /src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 1330680 /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1330680 /src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1330680 /src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java 1330680 /src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java 1330680 /src/main/protobuf/Client.proto 1330680 /src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 1330680 Diff: https://reviews.apache.org/r/4726/diff Testing --- It works when running this test: new HBaseTestingUtility(conf).startMiniCluster(); HBaseAdmin admin = new HBaseAdmin(conf); if (!admin.tableExists(test)) { HTableDescriptor tableDesc = new HTableDescriptor(test); tableDesc.addFamily(new HColumnDescriptor(fam)); admin.createTable(tableDesc); } HTable table = new HTable(conf, test); Put put; put = new Put(Bytes.toBytes(row1)); put.add(Bytes.toBytes(fam),Bytes.toBytes(qual1),Bytes.toBytes(val1)); table.put(put); put = new Put(Bytes.toBytes(row2)); put.add(Bytes.toBytes(fam),Bytes.toBytes(qual2),Bytes.toBytes(val2)); table.put(put); put = new Put(Bytes.toBytes(row3)); put.add(Bytes.toBytes(fam),Bytes.toBytes(qual3),Bytes.toBytes(val3)); table.put(put); table.flushCommits(); { System.out.println(returns all rows at once because of the caching); Scan scan = new Scan(); scan.setCaching(100); ResultScanner scanner = table.getScanner(scan); scanner.next(100); } { System.out.println(returns one row at a time because of the maxResultSize); Scan scan = new Scan(); scan.setCaching(100); scan.setMaxResultSize(1); ResultScanner scanner = table.getScanner(scan); scanner.next(100); } See output: returns all rows at once because of the caching 2012-04-25 22:18:47,494 DEBUG [main] client.ClientScanner(94): Creating scanner over test starting at key '' 2012-04-25 22:18:47,494 DEBUG [main] client.ClientScanner(206): Advancing internal scanner to startKey at '' 2012-04-25 22:18:47,499 DEBUG [main] client.ClientScanner(323): Rows returned 3 2012-04-25 22:18:47,502 DEBUG [main] client.ClientScanner(193): Finished with scanning at {NAME = 'test,,1335385126388.ed23a82f3d6ca2eab571918843796259.', STARTKEY = '', ENDKEY = '', ENCODED = ed23a82f3d6ca2eab571918843796259,} returns one row at a time because of the maxResultSize 2012-04-25 22:18:47,504 DEBUG [main] client.ClientScanner(94): Creating scanner over test starting at key '' 2012-04-25 22:18:47,505 DEBUG [main] client.ClientScanner(206): Advancing internal scanner to startKey at '' 2012-04-25 22:18:47,514 DEBUG [main] client.ClientScanner(323): Rows returned 1 2012-04-25 22:18:47,517 DEBUG [main] client.ClientScanner(323): Rows returned 1 2012-04-25 22:18:47,522 DEBUG [main] client.ClientScanner(323): Rows returned 1 Thanks, ferdy Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly - Key: HBASE-2214 URL: https://issues.apache.org/jira/browse/HBASE-2214 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Ferdy Galema Attachments:
[jira] [Updated] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated HBASE-2214: Status: Open (was: Patch Available) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly - Key: HBASE-2214 URL: https://issues.apache.org/jira/browse/HBASE-2214 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Ferdy Galema Attachments: HBASE-2214-0.94.txt, HBASE-2214-v4.txt, HBASE-2214-v5.txt, HBASE-2214_with_broken_TestShell.txt The notion that you set size rather than row count specifying how many rows a scanner should return in each cycle was raised over in hbase-1966. Its a good one making hbase regular though the data under it may vary. HBase-1966 was committed but the patch was constrained by the fact that it needed to not change RPC interface. This issue is about doing hbase-1966 for 0.21 in a clean, unconstrained way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262458#comment-13262458 ] jirapos...@reviews.apache.org commented on HBASE-2214: -- bq. On 2012-04-25 21:02:15, Ted Yu wrote: bq. /src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java, line 323 bq. https://reviews.apache.org/r/4726/diff/3/?file=104298#file104298line323 bq. bq. This should be removed. Ok done. - ferdy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4726/#review7242 --- On 2012-04-26 08:18:40, ferdy wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4726/ bq. --- bq. bq. (Updated 2012-04-26 08:18:40) bq. bq. bq. Review request for hbase and Ted Yu. bq. bq. bq. Summary bq. --- bq. bq. HBASE-2214 per scan max buffersize. bq. bq. bq. This addresses bug HBASE-2214. bq. https://issues.apache.org/jira/browse/HBASE-2214 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java 1330680 bq./src/main/protobuf/Client.proto 1330680 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 1330680 bq. bq. Diff: https://reviews.apache.org/r/4726/diff bq. bq. bq. Testing bq. --- bq. bq. It works when running this test: bq. bq. bq. new HBaseTestingUtility(conf).startMiniCluster(); bq. bq. HBaseAdmin admin = new HBaseAdmin(conf); bq. if (!admin.tableExists(test)) { bq.HTableDescriptor tableDesc = new HTableDescriptor(test); bq.tableDesc.addFamily(new HColumnDescriptor(fam)); bq.admin.createTable(tableDesc); bq. } bq. bq. bq. HTable table = new HTable(conf, test); bq. Put put; bq. bq. put = new Put(Bytes.toBytes(row1)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual1),Bytes.toBytes(val1)); bq. table.put(put); bq. bq. put = new Put(Bytes.toBytes(row2)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual2),Bytes.toBytes(val2)); bq. table.put(put); bq. bq. put = new Put(Bytes.toBytes(row3)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual3),Bytes.toBytes(val3)); bq. table.put(put); bq. bq. table.flushCommits(); bq. { bq.System.out.println(returns all rows at once because of the caching); bq.Scan scan = new Scan(); bq.scan.setCaching(100); bq.ResultScanner scanner = table.getScanner(scan); bq.scanner.next(100); bq. } bq. { bq.System.out.println(returns one row at a time because of the maxResultSize); bq.Scan scan = new Scan(); bq.scan.setCaching(100); bq.scan.setMaxResultSize(1); bq.ResultScanner scanner = table.getScanner(scan); bq.scanner.next(100); bq. } bq. bq. bq. See output: bq. bq. returns all rows at once because of the caching bq. 2012-04-25 22:18:47,494 DEBUG [main] client.ClientScanner(94): Creating scanner over test starting at key '' bq. 2012-04-25 22:18:47,494 DEBUG [main] client.ClientScanner(206): Advancing internal scanner to startKey at '' bq. 2012-04-25 22:18:47,499 DEBUG [main] client.ClientScanner(323): Rows returned 3 bq. 2012-04-25 22:18:47,502 DEBUG [main] client.ClientScanner(193): Finished with scanning at {NAME = 'test,,1335385126388.ed23a82f3d6ca2eab571918843796259.', STARTKEY = '', ENDKEY = '', ENCODED = ed23a82f3d6ca2eab571918843796259,} bq. returns one row at a time
[jira] [Updated] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated HBASE-2214: Status: Patch Available (was: Open) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly - Key: HBASE-2214 URL: https://issues.apache.org/jira/browse/HBASE-2214 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Ferdy Galema Attachments: HBASE-2214-0.94.txt, HBASE-2214-v4.txt, HBASE-2214-v5.txt, HBASE-2214_with_broken_TestShell.txt The notion that you set size rather than row count specifying how many rows a scanner should return in each cycle was raised over in hbase-1966. Its a good one making hbase regular though the data under it may vary. HBase-1966 was committed but the patch was constrained by the fact that it needed to not change RPC interface. This issue is about doing hbase-1966 for 0.21 in a clean, unconstrained way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated HBASE-2214: Attachment: HBASE-2214-v5.txt Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly - Key: HBASE-2214 URL: https://issues.apache.org/jira/browse/HBASE-2214 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Ferdy Galema Attachments: HBASE-2214-0.94.txt, HBASE-2214-v4.txt, HBASE-2214-v5.txt, HBASE-2214_with_broken_TestShell.txt The notion that you set size rather than row count specifying how many rows a scanner should return in each cycle was raised over in hbase-1966. Its a good one making hbase regular though the data under it may vary. HBase-1966 was committed but the patch was constrained by the fact that it needed to not change RPC interface. This issue is about doing hbase-1966 for 0.21 in a clean, unconstrained way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262459#comment-13262459 ] jirapos...@reviews.apache.org commented on HBASE-2214: -- bq. On 2012-04-26 03:10:35, Jimmy Xiang wrote: bq. /src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java, line 385 bq. https://reviews.apache.org/r/4726/diff/3/?file=104301#file104301line385 bq. bq. Can we set it only if maxResultSize is 0? Done. I've changed it everywhere else too so that maxResultSize only works when 0. (Setting to 0 is bogus anyway, it might cause infinite loops). bq. On 2012-04-26 03:10:35, Jimmy Xiang wrote: bq. /src/main/protobuf/Client.proto, line 197 bq. https://reviews.apache.org/r/4726/diff/3/?file=104311#file104311line197 bq. bq. Can we use uint64, without a default? So if it is not specified, we take it as -1. Done. (Including regenerating the protobuffers sources). I already noticed that int64 was not used anywhere else in the proto file. Just out of curiosity, are there good reasons to avoid int64? - ferdy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4726/#review7249 --- On 2012-04-26 08:18:40, ferdy wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4726/ bq. --- bq. bq. (Updated 2012-04-26 08:18:40) bq. bq. bq. Review request for hbase and Ted Yu. bq. bq. bq. Summary bq. --- bq. bq. HBASE-2214 per scan max buffersize. bq. bq. bq. This addresses bug HBASE-2214. bq. https://issues.apache.org/jira/browse/HBASE-2214 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java 1330680 bq./src/main/protobuf/Client.proto 1330680 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 1330680 bq. bq. Diff: https://reviews.apache.org/r/4726/diff bq. bq. bq. Testing bq. --- bq. bq. It works when running this test: bq. bq. bq. new HBaseTestingUtility(conf).startMiniCluster(); bq. bq. HBaseAdmin admin = new HBaseAdmin(conf); bq. if (!admin.tableExists(test)) { bq.HTableDescriptor tableDesc = new HTableDescriptor(test); bq.tableDesc.addFamily(new HColumnDescriptor(fam)); bq.admin.createTable(tableDesc); bq. } bq. bq. bq. HTable table = new HTable(conf, test); bq. Put put; bq. bq. put = new Put(Bytes.toBytes(row1)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual1),Bytes.toBytes(val1)); bq. table.put(put); bq. bq. put = new Put(Bytes.toBytes(row2)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual2),Bytes.toBytes(val2)); bq. table.put(put); bq. bq. put = new Put(Bytes.toBytes(row3)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual3),Bytes.toBytes(val3)); bq. table.put(put); bq. bq. table.flushCommits(); bq. { bq.System.out.println(returns all rows at once because of the caching); bq.Scan scan = new Scan(); bq.scan.setCaching(100); bq.ResultScanner scanner = table.getScanner(scan); bq.scanner.next(100); bq. } bq. { bq.System.out.println(returns one row at a time because of the maxResultSize); bq.Scan scan = new Scan(); bq.scan.setCaching(100); bq.scan.setMaxResultSize(1); bq.ResultScanner scanner = table.getScanner(scan); bq.scanner.next(100); bq. } bq. bq. bq. See output: bq.
[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262462#comment-13262462 ] nkeywal commented on HBASE-5877: bq. Can we mark the failure and make this RegionMovedException behave the same as NotServingRegionException ? Done. bq. For updateCachedLocations(), please put explanation for parameter on the same line as the parameter: Done. bq. 'Failed all' - 'Failed call' It's an existing comment that we can find again later in the code. It really means failed all: all the queries on this server failed. I don't mind changing it to something better, but I think we should keep the all. bq. 'which the server' - 'which the region' Done. bq. Please increase the VERSION of HRegionInterface Done. bq. How is the server removed from cache since I see 'continue' above ? That's what makes this code complex and difficult to change: the error is actually managed later, when we don't have the real exception anymore. bq. For ServerManager.sendRegionClose(), please add javadoc for destServerName param. Done. bq. Is it possible that destServerName is null ? Safety checks added. bq. Please change the above to debug log. Why is the above fatal (regionResult != null) ? Step 4 appears in a comment below the above code. Should the above say step 3 ? Bad logs fixed. When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-5611: Attachment: HBASE-5611-94.patch HBASE-5611-trunk-v2.patch Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262486#comment-13262486 ] Hadoop QA commented on HBASE-2214: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524407/HBASE-2214-v5.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1654//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1654//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1654//console This message is automatically generated. Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly - Key: HBASE-2214 URL: https://issues.apache.org/jira/browse/HBASE-2214 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Ferdy Galema Attachments: HBASE-2214-0.94.txt, HBASE-2214-v4.txt, HBASE-2214-v5.txt, HBASE-2214_with_broken_TestShell.txt The notion that you set size rather than row count specifying how many rows a scanner should return in each cycle was raised over in hbase-1966. Its a good one making hbase regular though the data under it may vary. HBase-1966 was committed but the patch was constrained by the fact that it needed to not change RPC interface. This issue is about doing hbase-1966 for 0.21 in a clean, unconstrained way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-5611: Attachment: (was: HBASE-5611-94.patch) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-5611: Attachment: (was: HBASE-5611-trunk-v2.patch) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-5611: Attachment: HBASE-5611-trunk-v2.patch HBASE-5611-94.patch Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3529) Add search to HBase
[ https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262494#comment-13262494 ] Martin Alig commented on HBASE-3529: @Json: Are you still working on this issue? Add search to HBase --- Key: HBASE-3529 URL: https://issues.apache.org/jira/browse/HBASE-3529 Project: HBase Issue Type: Improvement Affects Versions: 0.90.0 Reporter: Jason Rutherglen Attachments: HBASE-3529.patch, HDFS-APPEND-0.20-LOCAL-FILE.patch Using the Apache Lucene library we can add freetext search to HBase. The advantages of this are: * HBase is highly scalable and distributed * HBase is realtime * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312) * Lucene offers many types of queries not currently available in HBase (eg, AND, OR, NOT, phrase, etc) * It's easier to build scalable realtime systems on top of already architecturally sound, scalable realtime data system, eg, HBase. * Scaling realtime search will be as simple as scaling HBase. Phase 1 - Indexing: * Integrate Lucene into HBase such that an index mirrors a given region. This means cascading add, update, and deletes between a Lucene index and an HBase region (and vice versa). * Define meta-data to mark a region as indexed, and use a Solr schema to allow the user to define the fields and analyzers. * Integrate with the HLog to ensure that index recovery can occur properly (eg, on region server failure) * Mirror region splits with indexes (use Lucene's IndexSplitter?) * When a region is written to HDFS, also write the corresponding Lucene index to HDFS. * A row key will be the ID of a given Lucene document. The Lucene docstore will explicitly not be used because the document/row data is stored in HBase. We will need to solve what the best data structure for efficiently mapping a docid - row key is. It could be a docstore, field cache, column stride fields, or some other mechanism. * Write unit tests for the above Phase 2 - Queries: * Enable distributed Lucene queries * Regions that have Lucene indexes are inherently available and may be searched on, meaning there's no need for a separate search related system in Zookeeper. * Integrate search with HBase's RPC mechanism -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5816) Balancer and ServerShutdownHandler concurrently reassigning the same region
[ https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262499#comment-13262499 ] ramkrishna.s.vasudevan commented on HBASE-5816: --- @Maryann If you are planning to work on this pls go ahead. :) Balancer and ServerShutdownHandler concurrently reassigning the same region --- Key: HBASE-5816 URL: https://issues.apache.org/jira/browse/HBASE-5816 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Maryann Xue Assignee: ramkrishna.s.vasudevan Priority: Critical Attachments: HBASE-5816.patch The first assign thread exits with success after updating the RegionState to PENDING_OPEN, while the second assign follows immediately into assign and fails the RegionState check in setOfflineInZooKeeper(). This causes the master to abort. In the below case, the two concurrent assigns occurred when AM tried to assign a region to a dying/dead RS, and meanwhile the ShutdownServerHandler tried to assign this region (from the region plan) spontaneously. 2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. (offlining) 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 2012-04-17 05:44:57,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING) 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=CLOSED, ts=1334612697672, server=hadoop05.sh.intel.com,60020,1334544902186 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x236b912e9b3000e Creating (or updating) unassigned node for fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:19,159 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=PENDING_OPEN, ts=1334613179096, server=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:59,033 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0 java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 12 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302 remote=/10.239.47.87:60020] at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283) at $Proxy7.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:573) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1127) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:912) at
[jira] [Created] (HBASE-5883) Backup master is going down due to connection refused exception
Gopinathan A created HBASE-5883: --- Summary: Backup master is going down due to connection refused exception Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.90.6, 0.94.0 Reporter: Gopinathan A The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean reassigned HBASE-5883: --- Assignee: Jieshan Bean Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262515#comment-13262515 ] Jieshan Bean commented on HBASE-5883: - From the below log: {noformat} 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused {noformat} We can deduce ConnectException was packaged as a IOException, likes below: new IOException(new ConnecException(Connection refused)); or something likes: new IOException(connectException.toString()); If so, this exception is not handled from the code. Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Reopened] (HBASE-5161) Compaction algorithm should prioritize reference files
[ https://issues.apache.org/jira/browse/HBASE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reopened HBASE-5161: --- Compaction algorithm should prioritize reference files -- Key: HBASE-5161 URL: https://issues.apache.org/jira/browse/HBASE-5161 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.1, 0.94.0 I got myself into a state where my table was un-splittable as long as the insert load was coming in. Emergency flushes because of the low memory barrier don't check the number of store files so it never blocks, to a point where I had in one case 45 store files and the compactions were almost never done on the reference files (had 15 of them, went down by one in 20 minutes). Since you can't split regions with reference files, that region couldn't split and was doomed to just get more store files until the load stopped. Marking this as a minor issue, what we really need is a better pushback mechanism but not prioritizing reference files seems wrong. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5161) Compaction algorithm should prioritize reference files
[ https://issues.apache.org/jira/browse/HBASE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262537#comment-13262537 ] ramkrishna.s.vasudevan commented on HBASE-5161: --- I did not want to open a new issue. Thought i can reopen this issue. Correct me if am wrong. I have a high load write operation going on. Region split keeps happening. When to one of my region the load becomes too heavy the split starts and creates lot of reference files which is greater than my maxfilestocompact. So suppose my 'hbase.hstore.compaction.max' is 20 and the reference files that are created is 32*2 (64 files). In compaction selection {code} if (compactSelection.getFilesToCompact().size() this.maxFilesToCompact) { int pastMax = compactSelection.getFilesToCompact().size() - this.maxFilesToCompact; compactSelection.clearSubList(0, pastMax); } {code} The filesToCompact is ordered based on seq id. In this case the set of files from 0 to pastMax (i.e) the reference files which has lesser seq id are not considered for compaction. By the time more store files are created and once again the earlier created ones are avoided. Those being reference files, the split never happens. The region grew upto 400G. Compaction algorithm should prioritize reference files -- Key: HBASE-5161 URL: https://issues.apache.org/jira/browse/HBASE-5161 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.1, 0.94.0 I got myself into a state where my table was un-splittable as long as the insert load was coming in. Emergency flushes because of the low memory barrier don't check the number of store files so it never blocks, to a point where I had in one case 45 store files and the compactions were almost never done on the reference files (had 15 of them, went down by one in 20 minutes). Since you can't split regions with reference files, that region couldn't split and was doomed to just get more store files until the load stopped. Marking this as a minor issue, what we really need is a better pushback mechanism but not prioritizing reference files seems wrong. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5161) Compaction algorithm should prioritize reference files
[ https://issues.apache.org/jira/browse/HBASE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262537#comment-13262537 ] ramkrishna.s.vasudevan edited comment on HBASE-5161 at 4/26/12 11:53 AM: - I did not want to open a new issue. Thought i can reopen this issue. Correct me if am wrong. I have a high load write operation going on. Region split keeps happening. When to one of my region the load becomes too heavy the split starts and creates lot of reference files which is greater than my maxfilestocompact. So suppose my 'hbase.hstore.compaction.max' is 20 and the reference files that are created is 32*2 (64 files). In compaction selection {code} if (compactSelection.getFilesToCompact().size() this.maxFilesToCompact) { int pastMax = compactSelection.getFilesToCompact().size() - this.maxFilesToCompact; compactSelection.clearSubList(0, pastMax); } {code} The filesToCompact is ordered based on seq id. In this case the set of files from 0 to pastMax (i.e) the reference files which has lesser seq id are not considered for compaction. By the time more store files are created and once again the earlier created ones are avoided. Those being reference files, the split never happens. The region grew upto 400G. Note that - We even tried to stop the writes for 2 to 3 hours but still the compaction did not pick up the reference file. was (Author: ram_krish): I did not want to open a new issue. Thought i can reopen this issue. Correct me if am wrong. I have a high load write operation going on. Region split keeps happening. When to one of my region the load becomes too heavy the split starts and creates lot of reference files which is greater than my maxfilestocompact. So suppose my 'hbase.hstore.compaction.max' is 20 and the reference files that are created is 32*2 (64 files). In compaction selection {code} if (compactSelection.getFilesToCompact().size() this.maxFilesToCompact) { int pastMax = compactSelection.getFilesToCompact().size() - this.maxFilesToCompact; compactSelection.clearSubList(0, pastMax); } {code} The filesToCompact is ordered based on seq id. In this case the set of files from 0 to pastMax (i.e) the reference files which has lesser seq id are not considered for compaction. By the time more store files are created and once again the earlier created ones are avoided. Those being reference files, the split never happens. The region grew upto 400G. Compaction algorithm should prioritize reference files -- Key: HBASE-5161 URL: https://issues.apache.org/jira/browse/HBASE-5161 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.1, 0.94.0 I got myself into a state where my table was un-splittable as long as the insert load was coming in. Emergency flushes because of the low memory barrier don't check the number of store files so it never blocks, to a point where I had in one case 45 store files and the compactions were almost never done on the reference files (had 15 of them, went down by one in 20 minutes). Since you can't split regions with reference files, that region couldn't split and was doomed to just get more store files until the load stopped. Marking this as a minor issue, what we really need is a better pushback mechanism but not prioritizing reference files seems wrong. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5161) Compaction algorithm should prioritize reference files
[ https://issues.apache.org/jira/browse/HBASE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262638#comment-13262638 ] Jieshan Bean commented on HBASE-5161: - It's a really special scenario. High input pressure and splitting run in parallel caused this vicious circle. We should check whether the compaction is still running. Compact a 400G region will take long time. Compaction algorithm should prioritize reference files -- Key: HBASE-5161 URL: https://issues.apache.org/jira/browse/HBASE-5161 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.1, 0.94.0 I got myself into a state where my table was un-splittable as long as the insert load was coming in. Emergency flushes because of the low memory barrier don't check the number of store files so it never blocks, to a point where I had in one case 45 store files and the compactions were almost never done on the reference files (had 15 of them, went down by one in 20 minutes). Since you can't split regions with reference files, that region couldn't split and was doomed to just get more store files until the load stopped. Marking this as a minor issue, what we really need is a better pushback mechanism but not prioritizing reference files seems wrong. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262648#comment-13262648 ] Zhihong Yu commented on HBASE-5611: --- @Jieshan: When you have multiple patches for different branches, attach patch for trunk apart from the other patches. Otherwise Hadoop QA may pick up the wrong patch. Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5611: -- Hadoop Flags: Reviewed Status: Patch Available (was: Open) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262651#comment-13262651 ] Jieshan Bean commented on HBASE-5611: - Thank you, Ted. I will take care. Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262652#comment-13262652 ] Zhihong Yu commented on HBASE-5611: --- Patch v2 looks good in general. Comment on formatting: {code} + * @param regionName + * region name. {code} The line length is 100 chars. Please put javadoc for param on the same line as param name. You can wait for Hadoop QA result to come back before attaching new patches. Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262656#comment-13262656 ] Jieshan Bean commented on HBASE-5611: - I use the formatter from HBASE-3678. Every time, it format my code like that. I will change after QA result. Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262652#comment-13262652 ] Zhihong Yu edited comment on HBASE-5611 at 4/26/12 2:58 PM: Patch v2 looks good in general. Comment on formatting: {code} + * @param regionName + * region name. {code} The line length limit is 100 chars. Please put javadoc for param on the same line as param name. You can wait for Hadoop QA result to come back before attaching new patches. was (Author: zhi...@ebaysf.com): Patch v2 looks good in general. Comment on formatting: {code} + * @param regionName + * region name. {code} The line length is 100 chars. Please put javadoc for param on the same line as param name. You can wait for Hadoop QA result to come back before attaching new patches. Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262658#comment-13262658 ] ramkrishna.s.vasudevan commented on HBASE-5875: --- I would like to get some suggestions in this {code} boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); ServerName currentRootServer = null; if (!catalogTracker.verifyRootRegionLocation(timeout)) { currentRootServer = this.catalogTracker.getRootLocation(); {code} Consider the case where my ROOT node is found in RIT. Hence the processRIT will trigger the assignment. It so happened that when i try to verifyRootRegionLocation the root node is created but the OpenRegionHandler has not added the ROOT region in its memory(very very corner case and this happened once while testing). So the verifyRootRegionLocation returns false and hence the master thinks it an server to be expired. So we just remove an normal active RS from the master memory thinking it as dead. So i lose a RS itself from the master's list of online servers. How can we handle this scenario? Can we retry the verifyRootRegionLocation if it returns false and the boolean variable 'rit' is true? Or can we update the root region node in the RS side after updating the online server list? Suggestions welcome... Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262673#comment-13262673 ] Hadoop QA commented on HBASE-5611: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524416/HBASE-5611-trunk-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1656//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1656//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1656//console This message is automatically generated. Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262698#comment-13262698 ] Zhihong Yu commented on HBASE-5875: --- bq. Or can we update the root region node in the RS side after updating the online server list? Let's try this approach first. The other approach would involve retry count, sleep interval, etc. Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262702#comment-13262702 ] Zhihong Yu commented on HBASE-5611: --- Tests were clear. @Jieshan: Please address formatting and prepare patches for each branch. We should also run test suite for 0.90 and 0.92 once patches are available. Good job. Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, HBASE-5611-trunk.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262710#comment-13262710 ] Zhihong Yu commented on HBASE-5877: --- @N: I think the following validation in real cluster would illustrate the benefit of this feature: For given table, select a region server and note the row key ranges hosted by the region server. Direct client load to this server. Kill the server at time T. Difference in client response to region migration around time T with and without the patch would be interesting. When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5877: -- Comment: was deleted (was: @N: I think the following validation in real cluster would illustrate the benefit of this feature: For given table, select a region server and note the row key ranges hosted by the region server. Direct client load to this server. Kill the server at time T. Difference in client response to region migration around time T with and without the patch would be interesting.) When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262714#comment-13262714 ] Zhihong Yu commented on HBASE-5877: --- @N: I think the following validation in real cluster would illustrate the benefit of this feature: For given table, select a region server and note the row key ranges hosted by one region on the region server. Direct client load to this region. Issue the following command in shell: {code} hbase move 'ENCODED_REGIONNAME', 'SERVER_NAME' {code} at time T. Difference in client response to region migration around time T with and without the patch would be interesting. When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262717#comment-13262717 ] nkeywal commented on HBASE-5877: Note that I'm currently rewriting the patch, as it conflicts with the protobuf stuff that was committed recently... But the logic hasn't changed. @ted What we're saving in the current implementation is a call to the master. It can be interesting in itself if the region moves is used by a lot of clients. We could do better by letting the client know that the region is now fully available somewhere else and that there is no need to wait before retrying. But right now the region server only knows that the region is closed and moved to another server. It doesn't know if the region is opened yet. We could have this by adding the info in zk, but it would increase the zk load... When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5856) byte - String is not consistent between HBaseAdmin and HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262724#comment-13262724 ] stack commented on HBASE-5856: -- Are you passing the name copied from the UI using double quotes or single quotes? The shell acts differently if double quotes. If double quotes it will pass to hbase unescaped region name. At least, it used to do this. Thanks Binlijin. byte - String is not consistent between HBaseAdmin and HRegionInfo Key: HBASE-5856 URL: https://issues.apache.org/jira/browse/HBASE-5856 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1 Reporter: binlijin Attachments: HBASE-5856-0.92.patch In HBaseAdmin public void split(final String tableNameOrRegionName) throws IOException, InterruptedException { split(Bytes.toBytes(tableNameOrRegionName)); // string - byte } In HRegionInfo this.regionNameStr = Bytes.toStringBinary(this.regionName); // byte - string Should we use Bytes.toBytesBinary in HBaseAdmin ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262731#comment-13262731 ] Zhihong Yu commented on HBASE-5877: --- @N: If the testing result is favorable, I think Lars may want it in 0.94 as well. I think making this feature functional in 0.94 cluster would be a good start. bq. We could have this by adding the info in zk A separate discussion should be started w.r.t. the above. This would shift load imposed by clients from master to zk quorum. When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5859) Optimize the rolling restart script
[ https://issues.apache.org/jira/browse/HBASE-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262732#comment-13262732 ] stack commented on HBASE-5859: -- I like the idea of regionservers being allocated a range of ports rather than an explicit one. Ditto for its UI. We'd have to do the tooling to support moving ports first -- the port the client talks to and that of the UI (maybe one day we just remove the UI from regionservers; instead have a separate UI app that is fed via jmx, etc) -- and then once that was done, we could do nice tricks like this. Can you think of any other tricks we could do if the ports regionservers answered on were fuzzy? Optimize the rolling restart script --- Key: HBASE-5859 URL: https://issues.apache.org/jira/browse/HBASE-5859 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Priority: Minor There is a graceful_stop script. This algorithm: {noformat} for i = 0 to servers.size { regionsInServer = servers[i].regions move servers[i].regions to random stop servers[i] start servers[i] move regionsInServer to servers[i] //filled back with the same regions } {noformat} It would be possible to optimize it while keeping data locality with {noformat} for i = 0 to servers.size { start servers[i*2+1] on the computer of servers[i] // Two RS on the same box move servers[i].regions to servers[i*2+1] // The one on the same box stop servers[i] } {noformat} There would be an impact with a fixed port configuration. To fix this, we could: - use a range of port instead of a single port. This could be an issue for the web port. - start on a port then reuse the fixed ones when they become available. This is not very elegant if a client code is already using the previous code. Moreover the region server code is written in the meta table. - do a mix of the two solutions: a range for the server itself, while waiting for the web port to be available. To be discussed... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5548) Add ability to get a table in the shell
[ https://issues.apache.org/jira/browse/HBASE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-5548: --- Attachment: ruby_HBASE-5548-v5.patch Attaching patch with fixes as per stack. Turns out the get issue was just a misspelling I didn't catch; _think_ everything else is working. Also, this includes a slight refactor so the help text actually lives in the table class, not remotely in table_help.rb. If it was enough that I forgot where it lived, then others will too :) Should be good to go. Add ability to get a table in the shell --- Key: HBASE-5548 URL: https://issues.apache.org/jira/browse/HBASE-5548 Project: HBase Issue Type: Improvement Components: shell Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0, 0.94.1 Attachments: ruby_HBASE-5528-v0.patch, ruby_HBASE-5548-v1.patch, ruby_HBASE-5548-v2.patch, ruby_HBASE-5548-v3.patch, ruby_HBASE-5548-v5.patch Currently, all the commands that operate on a table in the shell first have to take the table as name as input. There are two main considerations: * It is annoying to have to write the table name every time, when you should just be able to get a reference to a table * the current implementation is very wasteful - it creates a new HTable for each call (but reuses the connection since it uses the same configuration) We should be able to get a handle to a single HTable and then operate on that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262734#comment-13262734 ] jirapos...@reviews.apache.org commented on HBASE-2214: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4726/#review7268 --- Looks good. It seems you are using an old version protoc, which is ok. Great stuff. - Jimmy On 2012-04-26 08:18:40, ferdy wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4726/ bq. --- bq. bq. (Updated 2012-04-26 08:18:40) bq. bq. bq. Review request for hbase and Ted Yu. bq. bq. bq. Summary bq. --- bq. bq. HBASE-2214 per scan max buffersize. bq. bq. bq. This addresses bug HBASE-2214. bq. https://issues.apache.org/jira/browse/HBASE-2214 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java 1330680 bq./src/main/protobuf/Client.proto 1330680 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 1330680 bq. bq. Diff: https://reviews.apache.org/r/4726/diff bq. bq. bq. Testing bq. --- bq. bq. It works when running this test: bq. bq. bq. new HBaseTestingUtility(conf).startMiniCluster(); bq. bq. HBaseAdmin admin = new HBaseAdmin(conf); bq. if (!admin.tableExists(test)) { bq.HTableDescriptor tableDesc = new HTableDescriptor(test); bq.tableDesc.addFamily(new HColumnDescriptor(fam)); bq.admin.createTable(tableDesc); bq. } bq. bq. bq. HTable table = new HTable(conf, test); bq. Put put; bq. bq. put = new Put(Bytes.toBytes(row1)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual1),Bytes.toBytes(val1)); bq. table.put(put); bq. bq. put = new Put(Bytes.toBytes(row2)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual2),Bytes.toBytes(val2)); bq. table.put(put); bq. bq. put = new Put(Bytes.toBytes(row3)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual3),Bytes.toBytes(val3)); bq. table.put(put); bq. bq. table.flushCommits(); bq. { bq.System.out.println(returns all rows at once because of the caching); bq.Scan scan = new Scan(); bq.scan.setCaching(100); bq.ResultScanner scanner = table.getScanner(scan); bq.scanner.next(100); bq. } bq. { bq.System.out.println(returns one row at a time because of the maxResultSize); bq.Scan scan = new Scan(); bq.scan.setCaching(100); bq.scan.setMaxResultSize(1); bq.ResultScanner scanner = table.getScanner(scan); bq.scanner.next(100); bq. } bq. bq. bq. See output: bq. bq. returns all rows at once because of the caching bq. 2012-04-25 22:18:47,494 DEBUG [main] client.ClientScanner(94): Creating scanner over test starting at key '' bq. 2012-04-25 22:18:47,494 DEBUG [main] client.ClientScanner(206): Advancing internal scanner to startKey at '' bq. 2012-04-25 22:18:47,499 DEBUG [main] client.ClientScanner(323): Rows returned 3 bq. 2012-04-25 22:18:47,502 DEBUG [main] client.ClientScanner(193): Finished with scanning at {NAME = 'test,,1335385126388.ed23a82f3d6ca2eab571918843796259.', STARTKEY = '', ENDKEY = '', ENCODED = ed23a82f3d6ca2eab571918843796259,} bq. returns one row at a time because of the maxResultSize bq. 2012-04-25 22:18:47,504 DEBUG [main] client.ClientScanner(94): Creating scanner over test starting at key '' bq. 2012-04-25 22:18:47,505
[jira] [Reopened] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal reopened HBASE-5844: There is a regression when the cluster is fully distributed: the start command hangs. I'm on it. In the meantime, would it be possible to undo the commit? Sorry about this. Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262742#comment-13262742 ] Zhihong Yu commented on HBASE-5620: --- Is there plan to adopt various measures to counter this 8% performance dip ? Convert the client protocol of HRegionInterface to PB - Key: HBASE-5620 URL: https://issues.apache.org/jira/browse/HBASE-5620 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5620-sec.patch, hbase-5620_v3.patch, hbase-5620_v4.patch, hbase-5620_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5672) TestLruBlockCache#testBackgroundEvictionThread fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5672: - Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the patch Chunhui. Committed to trunk. TestLruBlockCache#testBackgroundEvictionThread fails occasionally - Key: HBASE-5672 URL: https://issues.apache.org/jira/browse/HBASE-5672 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-5672.patch, HBASE-5672v2.patch, HBASE-5672v3.patch We find TestLruBlockCache#testBackgroundEvictionThread fails occasionally. I think it's a problem of the test case. Because runEviction() only do evictionThread.evict(): {code} public void evict() { synchronized(this) { this.notify(); // FindBugs NN_NAKED_NOTIFY } } {code} However when we call evictionThread.evict(), the evictionThread may haven't been in run() in the TestLruBlockCache#testBackgroundEvictionThread. If we run the test many times, we could find failture easily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262748#comment-13262748 ] Zhihong Yu commented on HBASE-5862: --- @Stack: Do you have suggestions on further improvement for the latest patch ? After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Fix For: 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-94-3.patch If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262754#comment-13262754 ] stack commented on HBASE-5862: -- +1 on patch. Please add more comments around why you are hacking into hadoop metrics. Please also paste some pretty pictures so Lars can see why this has to be in 0.94. After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Fix For: 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-94-3.patch If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5862: - Priority: Critical (was: Minor) Fix Version/s: 0.94.0 After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-94-3.patch If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5884) MapReduce package info has broken link to bulk-loads
Jesse Yates created HBASE-5884: -- Summary: MapReduce package info has broken link to bulk-loads Key: HBASE-5884 URL: https://issues.apache.org/jira/browse/HBASE-5884 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Priority: Trivial Fix For: 0.96.0, 0.94.1 Bulk Loads link goes to an old link, which we have dropped recently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5884) MapReduce package info has broken link to bulk-loads
[ https://issues.apache.org/jira/browse/HBASE-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-5884: --- Attachment: doc_HBASE-5884.patch Attaching one-liner fix. MapReduce package info has broken link to bulk-loads Key: HBASE-5884 URL: https://issues.apache.org/jira/browse/HBASE-5884 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Priority: Trivial Fix For: 0.96.0, 0.94.1 Attachments: doc_HBASE-5884.patch Bulk Loads link goes to an old link, which we have dropped recently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-5862: - Attachment: TSD.png Stack wanted a screen shot of what all the recent region metrics work has enabled. Here's a shot of TSDB showing average time of a multi put over all the regions of a table. It illustrates regions being split and new regions showing up. stack had some comments about places that needed some extra info I'll ge those trivial patches up in a sec. After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-94-3.patch, TSD.png If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262770#comment-13262770 ] Bohdan Mushkevych commented on HBASE-3996: -- @stack, @Ted Yu, @Lars Hofhansl Gentlemen, it seams that last changes [1] were submitted 4 weeks ago. My personal fear is that the ticket will get outdated due to Trunk changes and will miss 0.94-0.96 target. [1] Diff r7 https://reviews.apache.org/r/4411/diff/7/ Support multiple tables and scanners as input to the mapper in map/reduce jobs -- Key: HBASE-3996 URL: https://issues.apache.org/jira/browse/HBASE-3996 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Eran Kutner Assignee: Eran Kutner Fix For: 0.96.0 Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt, 3996-v6.txt, 3996-v7.txt, HBase-3996.patch It seems that in many cases feeding data from multiple tables or multiple scanners on a single table can save a lot of time when running map/reduce jobs. I propose a new MultiTableInputFormat class that would allow doing this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262773#comment-13262773 ] stack commented on HBASE-5844: -- I reverted the patch from trunk. Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262778#comment-13262778 ] Hadoop QA commented on HBASE-5862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524468/TSD.png against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1657//console This message is automatically generated. After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-94-3.patch, TSD.png If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5862: -- Comment: was deleted (was: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524468/TSD.png against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1657//console This message is automatically generated.) After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-94-3.patch, TSD.png If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5881) BuiltIn Gzip compressor decompressor not getting pooled, leading to native memory leak
[ https://issues.apache.org/jira/browse/HBASE-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5881: -- Fix Version/s: (was: 0.94.0) 0.94.1 BuiltIn Gzip compressor decompressor not getting pooled, leading to native memory leak Key: HBASE-5881 URL: https://issues.apache.org/jira/browse/HBASE-5881 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.92.1 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.92.2, 0.96.0, 0.94.1 This issue will occur only in hadoop 23.x above/ In hadoop 0.20.x {code} public static void returnDecompressor(Decompressor decompressor) { if (decompressor == null) { return; } decompressor.reset(); payback(decompressorPool, decompressor); } {code} In hadoop 0.23.x {code} public static void returnDecompressor(Decompressor decompressor) { if (decompressor == null) { return; } // if the decompressor can't be reused, don't pool it. if (decompressor.getClass().isAnnotationPresent(DoNotPool.class)) { return; } decompressor.reset(); payback(decompressorPool, decompressor); } {code} Here annotation has been added. By default this library will be loaded if there are no native library. {code} @DoNotPool public class BuiltInGzipDecompressor {code} Due to this each time new compressor/decompressor will be loaded, this leads to native memory leak. {noformat} 2012-04-25 22:11:48,093 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz] 2012-04-25 22:11:48,093 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz] 2012-04-25 22:11:48,093 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz] {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5872) Improve hadoopqa script to include checks for hadoop 0.23 build
[ https://issues.apache.org/jira/browse/HBASE-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262787#comment-13262787 ] stack commented on HBASE-5872: -- +1 on option #1. +1 on the patch. +1 on what Ted says above that it'd be best if compile against 0.23 succeeded before committing this. Improve hadoopqa script to include checks for hadoop 0.23 build --- Key: HBASE-5872 URL: https://issues.apache.org/jira/browse/HBASE-5872 Project: HBase Issue Type: New Feature Affects Versions: 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5872.patch There have been a few patches that have made it into hbase trunk that have broken the compile of hbase against hadoop 0.23.x, without being known for a few days. We could have the bot do a few things: 1) verify that patch compiles against hadoop 23 2) verify that unit tests pass against hadoop 23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5881) BuiltIn Gzip compressor decompressor not getting pooled, leading to native memory leak
[ https://issues.apache.org/jira/browse/HBASE-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262790#comment-13262790 ] stack commented on HBASE-5881: -- Is the existance of this JIRA sufficient documentation until above is fixed? BuiltIn Gzip compressor decompressor not getting pooled, leading to native memory leak Key: HBASE-5881 URL: https://issues.apache.org/jira/browse/HBASE-5881 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.92.1 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.92.2, 0.96.0, 0.94.1 This issue will occur only in hadoop 23.x above/ In hadoop 0.20.x {code} public static void returnDecompressor(Decompressor decompressor) { if (decompressor == null) { return; } decompressor.reset(); payback(decompressorPool, decompressor); } {code} In hadoop 0.23.x {code} public static void returnDecompressor(Decompressor decompressor) { if (decompressor == null) { return; } // if the decompressor can't be reused, don't pool it. if (decompressor.getClass().isAnnotationPresent(DoNotPool.class)) { return; } decompressor.reset(); payback(decompressorPool, decompressor); } {code} Here annotation has been added. By default this library will be loaded if there are no native library. {code} @DoNotPool public class BuiltInGzipDecompressor {code} Due to this each time new compressor/decompressor will be loaded, this leads to native memory leak. {noformat} 2012-04-25 22:11:48,093 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz] 2012-04-25 22:11:48,093 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz] 2012-04-25 22:11:48,093 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz] {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5672) TestLruBlockCache#testBackgroundEvictionThread fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262789#comment-13262789 ] Hudson commented on HBASE-5672: --- Integrated in HBase-TRUNK #2817 (See [https://builds.apache.org/job/HBase-TRUNK/2817/]) HBASE-5672 TestLruBlockCache#testBackgroundEvictionThread fails occasionally (Revision 1330971) Result = SUCCESS stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestLruBlockCache.java TestLruBlockCache#testBackgroundEvictionThread fails occasionally - Key: HBASE-5672 URL: https://issues.apache.org/jira/browse/HBASE-5672 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-5672.patch, HBASE-5672v2.patch, HBASE-5672v3.patch We find TestLruBlockCache#testBackgroundEvictionThread fails occasionally. I think it's a problem of the test case. Because runEviction() only do evictionThread.evict(): {code} public void evict() { synchronized(this) { this.notify(); // FindBugs NN_NAKED_NOTIFY } } {code} However when we call evictionThread.evict(), the evictionThread may haven't been in run() in the TestLruBlockCache#testBackgroundEvictionThread. If we run the test many times, we could find failture easily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5161) Compaction algorithm should prioritize reference files
[ https://issues.apache.org/jira/browse/HBASE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5161: - Fix Version/s: (was: 0.94.0) 0.94.1 Compaction algorithm should prioritize reference files -- Key: HBASE-5161 URL: https://issues.apache.org/jira/browse/HBASE-5161 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.1, 0.94.1 I got myself into a state where my table was un-splittable as long as the insert load was coming in. Emergency flushes because of the low memory barrier don't check the number of store files so it never blocks, to a point where I had in one case 45 store files and the compactions were almost never done on the reference files (had 15 of them, went down by one in 20 minutes). Since you can't split regions with reference files, that region couldn't split and was doomed to just get more store files until the load stopped. Marking this as a minor issue, what we really need is a better pushback mechanism but not prioritizing reference files seems wrong. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5829) Inconsistency between the regions map and the servers map in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262793#comment-13262793 ] stack commented on HBASE-5829: -- @Ted Make a new issue? Inconsistency between the regions map and the servers map in AssignmentManager -- Key: HBASE-5829 URL: https://issues.apache.org/jira/browse/HBASE-5829 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1 Reporter: Maryann Xue Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt. In AssignmentManager.unassign(HRegionInfo, boolean) try { // TODO: We should consider making this look more like it does for the // region open where we catch all throwables and never abort if (serverManager.sendRegionClose(server, state.getRegion(), versionOfClosingNode)) { LOG.debug(Sent CLOSE to + server + for region + region.getRegionNameAsString()); return; } // This never happens. Currently regionserver close always return true. LOG.warn(Server + server + region CLOSE RPC returned false for + region.getRegionNameAsString()); } catch (NotServingRegionException nsre) { LOG.info(Server + server + returned + nsre + for + region.getRegionNameAsString()); // Presume that master has stale data. Presume remote side just split. // Presume that the split message when it comes in will fix up the master's // in memory cluster state. } catch (Throwable t) { if (t instanceof RemoteException) { t = ((RemoteException)t).unwrapRemoteException(); if (t instanceof NotServingRegionException) { if (checkIfRegionBelongsToDisabling(region)) { // Remove from the regionsinTransition map LOG.info(While trying to recover the table + region.getTableNameAsString() + to DISABLED state the region + region + was offlined but the table was in DISABLING state); synchronized (this.regionsInTransition) { this.regionsInTransition.remove(region.getEncodedName()); } // Remove from the regionsMap synchronized (this.regions) { this.regions.remove(region); } deleteClosingOrClosedNode(region); } } // RS is already processing this region, only need to update the timestamp if (t instanceof RegionAlreadyInTransitionException) { LOG.debug(update + state + the timestamp.); state.update(state.getState()); } } In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean) synchronized (this.regions) { this.regions.put(plan.getRegionInfo(), plan.getDestination()); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-5862: - Attachment: HBASE-5862-4.patch Patch with more comments. Also added a return if reflection was un-successful as there is no need to try more. After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-4.patch, HBASE-5862-94-3.patch, TSD.png If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5829) Inconsistency between the regions map and the servers map in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262798#comment-13262798 ] Zhihong Yu commented on HBASE-5829: --- The latest patch is good to go. Useless statement can be addressed elsewhere. Inconsistency between the regions map and the servers map in AssignmentManager -- Key: HBASE-5829 URL: https://issues.apache.org/jira/browse/HBASE-5829 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1 Reporter: Maryann Xue Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt. In AssignmentManager.unassign(HRegionInfo, boolean) try { // TODO: We should consider making this look more like it does for the // region open where we catch all throwables and never abort if (serverManager.sendRegionClose(server, state.getRegion(), versionOfClosingNode)) { LOG.debug(Sent CLOSE to + server + for region + region.getRegionNameAsString()); return; } // This never happens. Currently regionserver close always return true. LOG.warn(Server + server + region CLOSE RPC returned false for + region.getRegionNameAsString()); } catch (NotServingRegionException nsre) { LOG.info(Server + server + returned + nsre + for + region.getRegionNameAsString()); // Presume that master has stale data. Presume remote side just split. // Presume that the split message when it comes in will fix up the master's // in memory cluster state. } catch (Throwable t) { if (t instanceof RemoteException) { t = ((RemoteException)t).unwrapRemoteException(); if (t instanceof NotServingRegionException) { if (checkIfRegionBelongsToDisabling(region)) { // Remove from the regionsinTransition map LOG.info(While trying to recover the table + region.getTableNameAsString() + to DISABLED state the region + region + was offlined but the table was in DISABLING state); synchronized (this.regionsInTransition) { this.regionsInTransition.remove(region.getEncodedName()); } // Remove from the regionsMap synchronized (this.regions) { this.regions.remove(region); } deleteClosingOrClosedNode(region); } } // RS is already processing this region, only need to update the timestamp if (t instanceof RegionAlreadyInTransitionException) { LOG.debug(update + state + the timestamp.); state.update(state.getState()); } } In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean) synchronized (this.regions) { this.regions.put(plan.getRegionInfo(), plan.getDestination()); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5829) Inconsistency between the regions map and the servers map in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5829: - Resolution: Fixed Fix Version/s: 0.96.0 Assignee: Maryann Xue Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Applied to trunk. Letting patch hang out in case someone wants to apply it to other branches. I added you as a contributor Maryann and assigned you this issue (You can assign yourself issues going forward). Thanks for the patch. Inconsistency between the regions map and the servers map in AssignmentManager -- Key: HBASE-5829 URL: https://issues.apache.org/jira/browse/HBASE-5829 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1 Reporter: Maryann Xue Assignee: Maryann Xue Fix For: 0.96.0 Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt. In AssignmentManager.unassign(HRegionInfo, boolean) try { // TODO: We should consider making this look more like it does for the // region open where we catch all throwables and never abort if (serverManager.sendRegionClose(server, state.getRegion(), versionOfClosingNode)) { LOG.debug(Sent CLOSE to + server + for region + region.getRegionNameAsString()); return; } // This never happens. Currently regionserver close always return true. LOG.warn(Server + server + region CLOSE RPC returned false for + region.getRegionNameAsString()); } catch (NotServingRegionException nsre) { LOG.info(Server + server + returned + nsre + for + region.getRegionNameAsString()); // Presume that master has stale data. Presume remote side just split. // Presume that the split message when it comes in will fix up the master's // in memory cluster state. } catch (Throwable t) { if (t instanceof RemoteException) { t = ((RemoteException)t).unwrapRemoteException(); if (t instanceof NotServingRegionException) { if (checkIfRegionBelongsToDisabling(region)) { // Remove from the regionsinTransition map LOG.info(While trying to recover the table + region.getTableNameAsString() + to DISABLED state the region + region + was offlined but the table was in DISABLING state); synchronized (this.regionsInTransition) { this.regionsInTransition.remove(region.getEncodedName()); } // Remove from the regionsMap synchronized (this.regions) { this.regions.remove(region); } deleteClosingOrClosedNode(region); } } // RS is already processing this region, only need to update the timestamp if (t instanceof RegionAlreadyInTransitionException) { LOG.debug(update + state + the timestamp.); state.update(state.getState()); } } In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean) synchronized (this.regions) { this.regions.put(plan.getRegionInfo(), plan.getDestination()); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262802#comment-13262802 ] Hadoop QA commented on HBASE-5862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524470/HBASE-5862-4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1658//console This message is automatically generated. After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-4.patch, HBASE-5862-94-3.patch, TSD.png If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5862: - Resolution: Fixed Release Note: Fixes the case where when a region closed, its metrics did not; they stayed up associated w/ old hosting server even though region may have moved elsewhere. Status: Resolved (was: Patch Available) Committed trunk and 0.94. Thanks for the patch Elliott After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-4.patch, HBASE-5862-94-3.patch, TSD.png If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5801) [hbck] Hbck should handle case where some regions have different HTD settings in .regioninfo files (0.90 specific)
[ https://issues.apache.org/jira/browse/HBASE-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262807#comment-13262807 ] Jimmy Xiang commented on HBASE-5801: I looked into it and found out why deleteTable fails: one of the region is not closed. I will put up another patch soon. [hbck] Hbck should handle case where some regions have different HTD settings in .regioninfo files (0.90 specific) --- Key: HBASE-5801 URL: https://issues.apache.org/jira/browse/HBASE-5801 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jimmy Xiang Fix For: 0.90.7 Attachments: hbase_5801_v2.patch Recently, we encountered a case where some regions in a table have different HTableDescriptor settings serialized into HDFS their HRegionInfo .regioninfo file. hbck expects all HTDs within a table to be the same and currently bails out in this situation. We need to either point out a proper set of actions for the user to execute or automatically convert the region to a common HTD (likely the most common on, or possibly the first one.) Not sure if this requires reformatting data but may require closing and restarting a region. This issue is hbase 0.90.x specific -- 0.92+ keep all table info in a single .tableinfo file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5864) Error while reading from hfile in 0.94
[ https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262812#comment-13262812 ] Lars Hofhansl commented on HBASE-5864: -- Going to commit in a few unless there're objections. Error while reading from hfile in 0.94 -- Key: HBASE-5864 URL: https://issues.apache.org/jira/browse/HBASE-5864 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5864_1.patch, HBASE-5864_2.patch, HBASE-5864_3.patch, HBASE-5864_test.patch Got the following stacktrace during region split. {noformat} 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638) at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262810#comment-13262810 ] Lars Hofhansl commented on HBASE-5862: -- I don't get the Hadoop private field accessor stuff. Why do we need to clear out private fields? Is there an API missing for this in Hadoop? After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-4.patch, HBASE-5862-94-3.patch, TSD.png If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262808#comment-13262808 ] Zhihong Yu commented on HBASE-5862: --- {code} +//per hfile. Figuring out which cfs, hfiles, ... {code} Should cfs be in expanded form (column families) ? {code} +//and on the next tick of the metrics everything that is still relevant will be +//re-added. {code} 're-added' - 'added' or 'added again' The initialization work in clear() should be moved to RegionServerDynamicMetrics ctor because it is one time operation. After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-4.patch, HBASE-5862-94-3.patch, TSD.png If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262813#comment-13262813 ] Elliott Clark commented on HBASE-5862: -- @lars Yes there's a missing api. Hadoop metrics keeps a copy of all metrics created. That copy is used to expose the data to jmx and other consumers. There is no remove function. HADOOP-8313 was filed to correct this. However until that changes reflection was the only way. After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-4.patch, HBASE-5862-94-3.patch, TSD.png If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5801) [hbck] Hbck should handle case where some regions have different HTD settings in .regioninfo files (0.90 specific)
[ https://issues.apache.org/jira/browse/HBASE-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262816#comment-13262816 ] Jonathan Hsieh commented on HBASE-5801: --- Thanks for following through Jimmy. Adding more flakey tests is just going to cause more trouble down the line and it is better if we figure out and catch them before they get in! [hbck] Hbck should handle case where some regions have different HTD settings in .regioninfo files (0.90 specific) --- Key: HBASE-5801 URL: https://issues.apache.org/jira/browse/HBASE-5801 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jimmy Xiang Fix For: 0.90.7 Attachments: hbase_5801_v2.patch Recently, we encountered a case where some regions in a table have different HTableDescriptor settings serialized into HDFS their HRegionInfo .regioninfo file. hbck expects all HTDs within a table to be the same and currently bails out in this situation. We need to either point out a proper set of actions for the user to execute or automatically convert the region to a common HTD (likely the most common on, or possibly the first one.) Not sure if this requires reformatting data but may require closing and restarting a region. This issue is hbase 0.90.x specific -- 0.92+ keep all table info in a single .tableinfo file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5801) [hbck] Hbck should handle case where some regions have different HTD settings in .regioninfo files (0.90 specific)
[ https://issues.apache.org/jira/browse/HBASE-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5801: --- Status: Open (was: Patch Available) [hbck] Hbck should handle case where some regions have different HTD settings in .regioninfo files (0.90 specific) --- Key: HBASE-5801 URL: https://issues.apache.org/jira/browse/HBASE-5801 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jimmy Xiang Fix For: 0.90.7 Attachments: hbase_5801_v2.patch, hbase_5801_v3.patch Recently, we encountered a case where some regions in a table have different HTableDescriptor settings serialized into HDFS their HRegionInfo .regioninfo file. hbck expects all HTDs within a table to be the same and currently bails out in this situation. We need to either point out a proper set of actions for the user to execute or automatically convert the region to a common HTD (likely the most common on, or possibly the first one.) Not sure if this requires reformatting data but may require closing and restarting a region. This issue is hbase 0.90.x specific -- 0.92+ keep all table info in a single .tableinfo file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5801) [hbck] Hbck should handle case where some regions have different HTD settings in .regioninfo files (0.90 specific)
[ https://issues.apache.org/jira/browse/HBASE-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5801: --- Status: Patch Available (was: Open) I ran TestHBaseFsck 10x with v3 patch and all passed. [hbck] Hbck should handle case where some regions have different HTD settings in .regioninfo files (0.90 specific) --- Key: HBASE-5801 URL: https://issues.apache.org/jira/browse/HBASE-5801 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7 Reporter: Jonathan Hsieh Assignee: Jimmy Xiang Fix For: 0.90.7 Attachments: hbase_5801_v2.patch, hbase_5801_v3.patch Recently, we encountered a case where some regions in a table have different HTableDescriptor settings serialized into HDFS their HRegionInfo .regioninfo file. hbck expects all HTDs within a table to be the same and currently bails out in this situation. We need to either point out a proper set of actions for the user to execute or automatically convert the region to a common HTD (likely the most common on, or possibly the first one.) Not sure if this requires reformatting data but may require closing and restarting a region. This issue is hbase 0.90.x specific -- 0.92+ keep all table info in a single .tableinfo file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262820#comment-13262820 ] stack commented on HBASE-5877: -- @N This patch will benefit any move, not just rolling restart, right? Here are a quick couple of comments on the patch. I like addition of RegionMovedException. Convention is to capitalize static defines: '+ private static final String hostField = hostname=;' so it should be HOSTFIELD. Its super ugly that you have to parse exception message to get exception data member fields... but thats not your fault. Please keep the style of the surrounding code. This kinda thing is unconventional: {code} +private void updateCachedLocations( + SetString updateHistory, + HRegionLocation hrl, + Object t) { {code} Ugh on how ugly it is updating cache. Again, not your fault. Ted suggests updating interface version. Maybe don't. If you do, you can't get this into a 0.94.1, etc. I don't see changes to make use of this new functionality? I'd expect the balancer in master to make use of it? When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5885) Invalid HFile block magic on Local file System
Elliott Clark created HBASE-5885: Summary: Invalid HFile block magic on Local file System Key: HBASE-5885 URL: https://issues.apache.org/jira/browse/HBASE-5885 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Elliott Clark ERROR: java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=7, exceptions: Thu Apr 26 11:19:18 PDT 2012, org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for reader reader=file:/tmp/hbase-eclark/hbase/TestTable/e2d1c846363c75262cbfd85ea278b342/info/bae2681d63734066957b58fe791a0268, compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=01/info:data/1335463981520/Put, lastKey=0002588100/info:data/1335463902296/Put, avgKeyLen=30, avgValueLen=1000, entries=1215085, length=1264354417, cur=000248/info:data/1335463994457/Put/vlen=1000/ts=0] at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:135) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:95) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:368) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3323) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3279) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3296) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2393) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376) Caused by: java.io.IOException: Invalid HFile block magic: \xEC\xD5\x9D\xB4\xC2bfo at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:153) at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:164) at org.apache.hadoop.hbase.io.hfile.HFileBlock.init(HFileBlock.java:254) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1779) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1637) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:327) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:555) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:651) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:130) ... 12 more Thu Apr 26 11:19:19 PDT 2012, org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: java.io.IOException: java.lang.IllegalArgumentException at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1132) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1121) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2420) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:216) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:630) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:130) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:95) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:406) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3323) at
[jira] [Commented] (HBASE-5885) Invalid HFile block magic on Local file System
[ https://issues.apache.org/jira/browse/HBASE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262828#comment-13262828 ] Elliott Clark commented on HBASE-5885: -- Forgot to add that I also tried this with and without HBASE-5864 Invalid HFile block magic on Local file System -- Key: HBASE-5885 URL: https://issues.apache.org/jira/browse/HBASE-5885 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Elliott Clark ERROR: java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=7, exceptions: Thu Apr 26 11:19:18 PDT 2012, org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for reader reader=file:/tmp/hbase-eclark/hbase/TestTable/e2d1c846363c75262cbfd85ea278b342/info/bae2681d63734066957b58fe791a0268, compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=01/info:data/1335463981520/Put, lastKey=0002588100/info:data/1335463902296/Put, avgKeyLen=30, avgValueLen=1000, entries=1215085, length=1264354417, cur=000248/info:data/1335463994457/Put/vlen=1000/ts=0] at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:135) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:95) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:368) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3323) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3279) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3296) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2393) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376) Caused by: java.io.IOException: Invalid HFile block magic: \xEC\xD5\x9D\xB4\xC2bfo at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:153) at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:164) at org.apache.hadoop.hbase.io.hfile.HFileBlock.init(HFileBlock.java:254) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1779) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1637) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:327) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:555) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:651) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:130) ... 12 more Thu Apr 26 11:19:19 PDT 2012, org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: java.io.IOException: java.lang.IllegalArgumentException at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1132) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1121) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2420) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:216) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:630) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:130) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:95) at
[jira] [Commented] (HBASE-5885) Invalid HFile block magic on Local file System
[ https://issues.apache.org/jira/browse/HBASE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262837#comment-13262837 ] stack commented on HBASE-5885: -- What if you disable this setting? Does it still fail? {code} + /** + * If this parameter is set to true, then hbase will read + * data and then verify checksums. Checksum verification + * inside hdfs will be switched off. However, if the hbase-checksum + * verification fails, then it will switch back to using + * hdfs checksums for verifiying data that is being read from storage. + * + * If this parameter is set to false, then hbase will not + * verify any checksums, instead it will depend on checksum verification + * being done in the hdfs client. + */ + public static final String HBASE_CHECKSUM_VERIFICATION = + hbase.regionserver.checksum.verify; {code} Invalid HFile block magic on Local file System -- Key: HBASE-5885 URL: https://issues.apache.org/jira/browse/HBASE-5885 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Elliott Clark ERROR: java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=7, exceptions: Thu Apr 26 11:19:18 PDT 2012, org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for reader reader=file:/tmp/hbase-eclark/hbase/TestTable/e2d1c846363c75262cbfd85ea278b342/info/bae2681d63734066957b58fe791a0268, compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=01/info:data/1335463981520/Put, lastKey=0002588100/info:data/1335463902296/Put, avgKeyLen=30, avgValueLen=1000, entries=1215085, length=1264354417, cur=000248/info:data/1335463994457/Put/vlen=1000/ts=0] at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:135) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:95) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:368) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3323) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3279) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3296) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2393) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376) Caused by: java.io.IOException: Invalid HFile block magic: \xEC\xD5\x9D\xB4\xC2bfo at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:153) at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:164) at org.apache.hadoop.hbase.io.hfile.HFileBlock.init(HFileBlock.java:254) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1779) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1637) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:327) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:555) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:651) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:130) ... 12 more Thu Apr 26 11:19:19 PDT 2012, org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: java.io.IOException: java.lang.IllegalArgumentException at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1132) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1121) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2420) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at
[jira] [Updated] (HBASE-5879) Enable JMX metrics collection for the Thrift proxy
[ https://issues.apache.org/jira/browse/HBASE-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5879: --- Attachment: D2955.1.patch mbautin requested code review of [jira] [HBASE-5879] [89-fb] Enable JMX metrics collection for the Thrift proxy. Reviewers: Kannan, Liyin, sc, tedyu, JIRA We need to enable JMX on the Thrift proxy on a separate port different from the JMX port used by regionserver. This is necessary for metrics collection. TEST PLAN - Deploy to dev cluster. - Verify that it is possible to collect metrics through JMX from the Thrift proxy. REVISION DETAIL https://reviews.facebook.net/D2955 AFFECTED FILES bin/hbase-config.sh MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/6735/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. Enable JMX metrics collection for the Thrift proxy -- Key: HBASE-5879 URL: https://issues.apache.org/jira/browse/HBASE-5879 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Priority: Minor Attachments: D2955.1.patch We need to enable JMX on the Thrift proxy on a separate port different from the JMX port used by regionserver. This is necessary for metrics collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5879) Enable JMX metrics collection for the Thrift proxy
[ https://issues.apache.org/jira/browse/HBASE-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262841#comment-13262841 ] Phabricator commented on HBASE-5879: Kannan has accepted the revision [jira] [HBASE-5879] [89-fb] Enable JMX metrics collection for the Thrift proxy. REVISION DETAIL https://reviews.facebook.net/D2955 BRANCH enable_jmx_metrics_collection_for_the_thrift_HBASE-5879 Enable JMX metrics collection for the Thrift proxy -- Key: HBASE-5879 URL: https://issues.apache.org/jira/browse/HBASE-5879 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Priority: Minor Attachments: D2955.1.patch We need to enable JMX on the Thrift proxy on a separate port different from the JMX port used by regionserver. This is necessary for metrics collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5885) Invalid HFile block magic on Local file System
[ https://issues.apache.org/jira/browse/HBASE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262880#comment-13262880 ] Elliott Clark commented on HBASE-5885: -- I don't get an exception when that is set to false. Invalid HFile block magic on Local file System -- Key: HBASE-5885 URL: https://issues.apache.org/jira/browse/HBASE-5885 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Elliott Clark ERROR: java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=7, exceptions: Thu Apr 26 11:19:18 PDT 2012, org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for reader reader=file:/tmp/hbase-eclark/hbase/TestTable/e2d1c846363c75262cbfd85ea278b342/info/bae2681d63734066957b58fe791a0268, compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=01/info:data/1335463981520/Put, lastKey=0002588100/info:data/1335463902296/Put, avgKeyLen=30, avgValueLen=1000, entries=1215085, length=1264354417, cur=000248/info:data/1335463994457/Put/vlen=1000/ts=0] at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:135) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:95) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:368) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3323) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3279) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3296) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2393) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376) Caused by: java.io.IOException: Invalid HFile block magic: \xEC\xD5\x9D\xB4\xC2bfo at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:153) at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:164) at org.apache.hadoop.hbase.io.hfile.HFileBlock.init(HFileBlock.java:254) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1779) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1637) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:327) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:555) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:651) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:130) ... 12 more Thu Apr 26 11:19:19 PDT 2012, org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: java.io.IOException: java.lang.IllegalArgumentException at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1132) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1121) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2420) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:216) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:630) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:130) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:95) at
[jira] [Updated] (HBASE-5851) TestProcessBasedCluster sometimes fails; currently disabled -- needs to be fixed and reenabled
[ https://issues.apache.org/jira/browse/HBASE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5851: --- Status: Open (was: Patch Available) Agree with Stack, still flaky, will look into it more later. TestProcessBasedCluster sometimes fails; currently disabled -- needs to be fixed and reenabled -- Key: HBASE-5851 URL: https://issues.apache.org/jira/browse/HBASE-5851 Project: HBase Issue Type: Test Reporter: Zhihong Yu Assignee: Jimmy Xiang Attachments: disable.txt, hbase-5851.patch, hbase-5851_v2.patch, metahang.txt, zkfail.txt TestProcessBasedCluster failed in https://builds.apache.org/job/HBase-TRUNK-security/178 Looks like cluster failed to start: {code} 2012-04-21 14:22:32,666 INFO [Thread-1] util.ProcessBasedLocalHBaseCluster(176): Waiting for HBase to startup. Retries left: 2 java.io.IOException: Giving up trying to location region in meta: thread is interrupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1173) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:956) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:917) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:252) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:174) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:134) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:178) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5886) Add new metric for possible data loss due to puts without WAL
Matteo Bertozzi created HBASE-5886: -- Summary: Add new metric for possible data loss due to puts without WAL Key: HBASE-5886 URL: https://issues.apache.org/jira/browse/HBASE-5886 Project: HBase Issue Type: New Feature Components: metrics, regionserver Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Add a metrics to keep track of puts without WAL and possible data loss size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5886) Add new metric for possible data loss due to puts without WAL
[ https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5886: --- Attachment: HBASE-5886-v0.patch Add new metric for possible data loss due to puts without WAL -- Key: HBASE-5886 URL: https://issues.apache.org/jira/browse/HBASE-5886 Project: HBase Issue Type: New Feature Components: metrics, regionserver Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: metrics Attachments: HBASE-5886-v0.patch Add a metrics to keep track of puts without WAL and possible data loss size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5886) Add new metric for possible data loss due to puts without WAL
[ https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5886: --- Status: Patch Available (was: Open) Add new metric for possible data loss due to puts without WAL -- Key: HBASE-5886 URL: https://issues.apache.org/jira/browse/HBASE-5886 Project: HBase Issue Type: New Feature Components: metrics, regionserver Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: metrics Attachments: HBASE-5886-v0.patch Add a metrics to keep track of puts without WAL and possible data loss size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5886) Add new metric for possible data loss due to puts without WAL
[ https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262994#comment-13262994 ] Todd Lipcon commented on HBASE-5886: I think the metrics names could be improved here -- putWithoutWAL could be numPutsWithoutWAL and possibleDataLossSize perhaps mbInMemoryWithoutWAL? - in recordPossibleDataLossNoWal, use {{getAndIncrement}} instead of {{incrementAndGet}}, and compare the result to 0 -- otherwise you're doing an extra atomic op and have a potential race - the warning message should include the client IP address as well as the region ID. Perhaps something like Client 123.123.123.123 writing data to region abcdef12345 with WAL disabled. Data may be lost in the event of a crash. Will not log further warnings for this region. - the debug message when setting possibleDataLossSize back to 0 seems unnecessary. - DEFAULT_WARN_NO_WAL_INTERVAL is unused Add new metric for possible data loss due to puts without WAL -- Key: HBASE-5886 URL: https://issues.apache.org/jira/browse/HBASE-5886 Project: HBase Issue Type: New Feature Components: metrics, regionserver Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: metrics Attachments: HBASE-5886-v0.patch Add a metrics to keep track of puts without WAL and possible data loss size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5864) Error while reading from hfile in 0.94
[ https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5864: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.94 and 0.96. Thanks for the Ram. Thanks for reviews Dhruba and Ted. Error while reading from hfile in 0.94 -- Key: HBASE-5864 URL: https://issues.apache.org/jira/browse/HBASE-5864 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5864_1.patch, HBASE-5864_2.patch, HBASE-5864_3.patch, HBASE-5864_test.patch Got the following stacktrace during region split. {noformat} 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638) at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5737: - Issue Type: Improvement (was: Bug) Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch, HBASE-5737_3.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5866) Canary in tool package but says its in tools.
[ https://issues.apache.org/jira/browse/HBASE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5866: - Issue Type: New Feature (was: Bug) Canary in tool package but says its in tools. - Key: HBASE-5866 URL: https://issues.apache.org/jira/browse/HBASE-5866 Project: HBase Issue Type: New Feature Reporter: stack Assignee: stack Fix For: 0.94.0, 0.96.0 Attachments: 5866.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5866) Canary in tool package but says its in tools.
[ https://issues.apache.org/jira/browse/HBASE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5866: - Issue Type: Bug (was: New Feature) Canary in tool package but says its in tools. - Key: HBASE-5866 URL: https://issues.apache.org/jira/browse/HBASE-5866 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.94.0, 0.96.0 Attachments: 5866.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263094#comment-13263094 ] Lars Hofhansl commented on HBASE-5862: -- Thanks Elliot. After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-4.patch, HBASE-5862-94-3.patch, TSD.png If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5864) Error while reading from hfile in 0.94
[ https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263100#comment-13263100 ] Hudson commented on HBASE-5864: --- Integrated in HBase-0.94 #151 (See [https://builds.apache.org/job/HBase-0.94/151/]) HBASE-5864 Error while reading from hfile in 0.94 (Ram) (Revision 1331057) Result = ABORTED larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java Error while reading from hfile in 0.94 -- Key: HBASE-5864 URL: https://issues.apache.org/jira/browse/HBASE-5864 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5864_1.patch, HBASE-5864_2.patch, HBASE-5864_3.patch, HBASE-5864_test.patch Got the following stacktrace during region split. {noformat} 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638) at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263099#comment-13263099 ] Hudson commented on HBASE-5862: --- Integrated in HBase-0.94 #151 (See [https://builds.apache.org/job/HBase-0.94/151/]) HBASE-5862 After Region Close remove the Operation Metrics; ADDENDUM -- missing import (Revision 1331040) Result = ABORTED stack : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Critical Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5862-0.patch, HBASE-5862-1.patch, HBASE-5862-2.patch, HBASE-5862-3.patch, HBASE-5862-4.patch, HBASE-5862-94-3.patch, TSD.png If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5886) Add new metric for possible data loss due to puts without WAL
[ https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13263111#comment-13263111 ] Hadoop QA commented on HBASE-5886: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524562/HBASE-5886-v0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.TestHeapSize Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1659//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1659//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1659//console This message is automatically generated. Add new metric for possible data loss due to puts without WAL -- Key: HBASE-5886 URL: https://issues.apache.org/jira/browse/HBASE-5886 Project: HBase Issue Type: New Feature Components: metrics, regionserver Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: metrics Attachments: HBASE-5886-v0.patch Add a metrics to keep track of puts without WAL and possible data loss size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira