[jira] [Commented] (HBASE-5604) M/R tool to replay WAL files
[ https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252537#comment-13252537 ] Ted Yu commented on HBASE-5604: --- If you run the patch through 'arc lint', you would see (just excerpt): {code} Warning (TXT3) Line Too Long This line is 106 characters long, but the convention is 100 characters. 274 System.err.println( -D + BULK_OUTPUT_CONF_KEY + =/path/for/output); 275 System.err.println( (Exactly one table must be specified for this option, and not mapping can be specified!)); 276 System.err.println(Other options:); 277 System.err.println( -D + HLogInputFormat.START_TIME_KEY + =ms (only apply edit after this time)); 278 System.err.println( -D + HLogInputFormat.END_TIME_KEY + =ms (only apply edit before this time)); 279 System.err.println(For performance also consider the following options:\n 280 + -Dmapred.map.tasks.speculative.execution=false\n Warning (TXT3) Line Too Long This line is 105 characters long, but the convention is 100 characters. 275 System.err.println( (Exactly one table must be specified for this option, and not mapping can be specified!)); 276 System.err.println(Other options:); 277 System.err.println( -D + HLogInputFormat.START_TIME_KEY + =ms (only apply edit after this time)); 278 System.err.println( -D + HLogInputFormat.END_TIME_KEY + =ms (only apply edit before this time)); 279 System.err.println(For performance also consider the following options:\n 280 + -Dmapred.map.tasks.speculative.execution=false\n 281 + -Dmapred.reduce.tasks.speculative.execution=false); {code} Below is a long line: {code} +throw new IOException(option + must be specified either in the form 2001-02-20T16:35:06.99 or as a number of milliseconds); {code} Minor comment: the 'a' in 'as a number' should be omitted. M/R tool to replay WAL files Key: HBASE-5604 URL: https://issues.apache.org/jira/browse/HBASE-5604 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 5604-v9.txt, HLog-5604-v3.txt Just an idea I had. Might be useful for restore of a backup using the HLogs. This could an M/R (with a mapper per HLog file). The tool would get a timerange and a (set of) table(s). We'd pick the right HLogs based on time before the M/R job is started and then have a mapper per HLog file. The mapper would then go through the HLog, filter all WALEdits that didn't fit into the time range or are not any of the tables and then uses HFileOutputFormat to generate HFiles. Would need to indicate the splits we want, probably from a live table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5604) M/R tool to replay WAL files
[ https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252916#comment-13252916 ] Ted Yu commented on HBASE-5604: --- Patch looks good. Minor comment: {code} + HLog.Entry temp; + long i=-1; {code} Insert spaces around equal sign above. M/R tool to replay WAL files Key: HBASE-5604 URL: https://issues.apache.org/jira/browse/HBASE-5604 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0, 0.96.0 Attachments: 5604-v10.txt, 5604-v11.txt, 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 5604-v9.txt, HLog-5604-v3.txt Just an idea I had. Might be useful for restore of a backup using the HLogs. This could an M/R (with a mapper per HLog file). The tool would get a timerange and a (set of) table(s). We'd pick the right HLogs based on time before the M/R job is started and then have a mapper per HLog file. The mapper would then go through the HLog, filter all WALEdits that didn't fit into the time range or are not any of the tables and then uses HFileOutputFormat to generate HFiles. Would need to indicate the splits we want, probably from a live table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5780) Fix race in HBase regionserver startup vs ZK SASL authentication
[ https://issues.apache.org/jira/browse/HBASE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253074#comment-13253074 ] Ted Yu commented on HBASE-5780: --- {code} +} catch (InterruptedException e) { + LOG.error(Interrupted while waiting for the ZookeeperWatcher to authenticate, e); {code} Is it safe to proceed with start() in the above case ? Fix race in HBase regionserver startup vs ZK SASL authentication Key: HBASE-5780 URL: https://issues.apache.org/jira/browse/HBASE-5780 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.1, 0.94.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Attachments: HBASE-5780.patch Secure RegionServers sometimes fail to start with the following backtrace: 2012-03-22 17:20:16,737 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server centos60-20.ent.cloudera.com,60020,1332462015929: Unexpected exception during initialization, aborting org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /hbase/shutdown at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:295) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:518) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:494) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:569) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:532) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:634) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5604) M/R tool to replay WAL files
[ https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253085#comment-13253085 ] Ted Yu commented on HBASE-5604: --- TestWALPlayer.java doesn't have test category. M/R tool to replay WAL files Key: HBASE-5604 URL: https://issues.apache.org/jira/browse/HBASE-5604 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0, 0.96.0 Attachments: 5604-v10.txt, 5604-v11.txt, 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 5604-v9.txt, HLog-5604-v3.txt Just an idea I had. Might be useful for restore of a backup using the HLogs. This could an M/R (with a mapper per HLog file). The tool would get a timerange and a (set of) table(s). We'd pick the right HLogs based on time before the M/R job is started and then have a mapper per HLog file. The mapper would then go through the HLog, filter all WALEdits that didn't fit into the time range or are not any of the tables and then uses HFileOutputFormat to generate HFiles. Would need to indicate the splits we want, probably from a live table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5780) Fix race in HBase regionserver startup vs ZK SASL authentication
[ https://issues.apache.org/jira/browse/HBASE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253102#comment-13253102 ] Ted Yu commented on HBASE-5780: --- I think throwing exception immediately is better. Please also run through test suite using '-Psecurity' since Hadoop QA doesn't test security profile. Let us know the test result. Thanks Fix race in HBase regionserver startup vs ZK SASL authentication Key: HBASE-5780 URL: https://issues.apache.org/jira/browse/HBASE-5780 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.1, 0.94.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Attachments: HBASE-5780.patch Secure RegionServers sometimes fail to start with the following backtrace: 2012-03-22 17:20:16,737 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server centos60-20.ent.cloudera.com,60020,1332462015929: Unexpected exception during initialization, aborting org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /hbase/shutdown at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:295) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:518) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:494) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:569) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:532) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:634) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5778) Turn on WAL compression by default
[ https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253107#comment-13253107 ] Ted Yu commented on HBASE-5778: --- {code} + } catch (IndexOutOfBoundsException iobe) { +// this can happen with a corrupted file, fall through + } {code} I think we should note down the cause of failure to retrieve dictionary entry and provide clearer message in the IOE below: {code} if (entry == null) { throw new IOException(Missing dictionary entry for index + dictIdx); {code} Turn on WAL compression by default -- Key: HBASE-5778 URL: https://issues.apache.org/jira/browse/HBASE-5778 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch I ran some tests to verify if WAL compression should be turned on by default. For a use case where it's not very useful (values two order of magnitude bigger than the keys), the insert time wasn't different and the CPU usage 15% higher (150% CPU usage VS 130% when not compressing the WAL). When values are smaller than the keys, I saw a 38% improvement for the insert run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure WAL compression accounts for all the additional CPU usage, it might just be that we're able to insert faster and we spend more time in the MemStore per second (because our MemStores are bad when they contain tens of thousands of values). Those are two extremes, but it shows that for the price of some CPU we can save a lot. My machines have 2 quads with HT, so I still had a lot of idle CPUs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5778) Turn on WAL compression by default
[ https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253127#comment-13253127 ] Ted Yu commented on HBASE-5778: --- The remaining issue is about how the replication sink correctly decompresses WAL. From test output, I saw: {code} java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2243) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2249) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:129) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1700) {code} For replication sink, there is no CompressionContext in HLog$Entry which can be used to perform decompression. I agree the change should be reverted. Turn on WAL compression by default -- Key: HBASE-5778 URL: https://issues.apache.org/jira/browse/HBASE-5778 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch I ran some tests to verify if WAL compression should be turned on by default. For a use case where it's not very useful (values two order of magnitude bigger than the keys), the insert time wasn't different and the CPU usage 15% higher (150% CPU usage VS 130% when not compressing the WAL). When values are smaller than the keys, I saw a 38% improvement for the insert run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure WAL compression accounts for all the additional CPU usage, it might just be that we're able to insert faster and we spend more time in the MemStore per second (because our MemStores are bad when they contain tens of thousands of values). Those are two extremes, but it shows that for the price of some CPU we can save a lot. My machines have 2 quads with HT, so I still had a lot of idle CPUs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm
[ https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249555#comment-13249555 ] Ted Yu commented on HBASE-5656: --- Patch looks good. Minor comment: {code} +hcd.setCompressionType(reader.getCompressionAlgorithm()); {code} I think we should log the compression type used in the if block. LoadIncrementalHFiles createTable should detect and set compression algorithm - Key: HBASE-5656 URL: https://issues.apache.org/jira/browse/HBASE-5656 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.92.1 Reporter: Cosmin Lehene Assignee: Cosmin Lehene Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5656-simple.txt, HBASE-5656-0.92.patch, HBASE-5656-0.92.patch, HBASE-5656-0.92.patch Original Estimate: 1h Remaining Estimate: 1h LoadIncrementalHFiles doesn't set compression when creating the the table. This can be detected from the files within each family dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm
[ https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249602#comment-13249602 ] Ted Yu commented on HBASE-5656: --- Latest patch looks good to me. LoadIncrementalHFiles createTable should detect and set compression algorithm - Key: HBASE-5656 URL: https://issues.apache.org/jira/browse/HBASE-5656 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.92.1 Reporter: Cosmin Lehene Assignee: Cosmin Lehene Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5656-simple.txt, HBASE-5656-0.92.patch, HBASE-5656-0.92.patch, HBASE-5656-0.92.patch, HBASE-5656-0.92.patch Original Estimate: 1h Remaining Estimate: 1h LoadIncrementalHFiles doesn't set compression when creating the the table. This can be detected from the files within each family dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5727) secure hbase build broke because of 'HBASE-5451 Switch RPC call envelope/headers to PBs'
[ https://issues.apache.org/jira/browse/HBASE-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249612#comment-13249612 ] Ted Yu commented on HBASE-5727: --- TestRowProcessorEndpoint in security profile passed smoothly with latest patch. +1 on integrating the patch and launching another secure build. Thanks for working over the weekend, Devaraj. secure hbase build broke because of 'HBASE-5451 Switch RPC call envelope/headers to PBs' Key: HBASE-5727 URL: https://issues.apache.org/jira/browse/HBASE-5727 Project: HBase Issue Type: Bug Reporter: stack Assignee: Devaraj Das Priority: Blocker Attachments: 5727.1.patch, 5727.patch If you build with the security profile -- i.e. add '-P security' on the command line -- you'll see that the secure build is broke since we messed in rpc. Assigning Deveraj to take a look. If you can't work on this now DD, just give it back to me and I'll have a go at it. Thanks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243743#comment-13243743 ] Ted Yu commented on HBASE-5689: --- Using a TreeMap is common practice. Please attach test suite result - Hadoop QA is not working. Skipping RecoveredEdits may cause data loss --- Key: HBASE-5689 URL: https://issues.apache.org/jira/browse/HBASE-5689 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0 Attachments: 5689-simplified.txt, 5689-testcase.patch, HBASE-5689.patch Let's see the following scenario: 1.Region is on the server A 2.put KV(r1-v1) to the region 3.move region from server A to server B 4.put KV(r2-v2) to the region 5.move region from server B to server A 6.put KV(r3-v3) to the region 7.kill -9 server B and start it 8.kill -9 server A and start it 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third KV(r3-v3) is lost. Let's analyse the upper scenario from the code: 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same hlog file on server A. 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region. 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region. 3.however, RecoveredEdits file f2 will be skiped when initializing region HRegion#replayRecoveredEditsIfAny {code} for (Path edits: files) { if (edits == null || !this.fs.exists(edits)) { LOG.warn(Null or non-existent edits file: + edits); continue; } if (isZeroLengthThenDelete(this.fs, edits)) continue; if (checkSafeToSkip) { Path higher = files.higher(edits); long maxSeqId = Long.MAX_VALUE; if (higher != null) { // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+ String fileName = higher.getName(); maxSeqId = Math.abs(Long.parseLong(fileName)); } if (maxSeqId = minSeqId) { String msg = Maximum possible sequenceid for this log is + maxSeqId + , skipped the whole file, path= + edits; LOG.debug(msg); continue; } else { checkSafeToSkip = false; } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server
[ https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243746#comment-13243746 ] Ted Yu commented on HBASE-5693: --- CreateTableHandler isn't initializing the regions. Who will initialize them ? When creating a region, the master initializes it and creates a memstore within the master server - Key: HBASE-5693 URL: https://issues.apache.org/jira/browse/HBASE-5693 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5693.v1.patch I didn't do a complete analysis, but the attached patch saves more than 0.25s for each region creation and locally all the unit tests work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243747#comment-13243747 ] Ted Yu commented on HBASE-5677: --- Interesting. Chunhui proposed safe mode for Master in HBASE-5270. See https://issues.apache.org/jira/browse/HBASE-5270?focusedCommentId=13214394page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13214394 Can you verify that this issue has been fixed in 0.92.2 ? Thanks The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server
[ https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243762#comment-13243762 ] Ted Yu commented on HBASE-5693: --- It is called from OpenRegionHandler.openRegion() I once made some threads daemon which passed unit tests but resulted in master and region server failing to start. Testing on a real cluster is desirable. When creating a region, the master initializes it and creates a memstore within the master server - Key: HBASE-5693 URL: https://issues.apache.org/jira/browse/HBASE-5693 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5693.v1.patch I didn't do a complete analysis, but the attached patch saves more than 0.25s for each region creation and locally all the unit tests work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server
[ https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243765#comment-13243765 ] Ted Yu commented on HBASE-5693: --- @N: Can you rebased the patch for trunk ? {code} Hunk #3 FAILED at 3613. 1 out of 3 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java.rej {code} When creating a region, the master initializes it and creates a memstore within the master server - Key: HBASE-5693 URL: https://issues.apache.org/jira/browse/HBASE-5693 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5693.v1.patch I didn't do a complete analysis, but the attached patch saves more than 0.25s for each region creation and locally all the unit tests work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5665) Repeated split causes HRegionServer failures and breaks table
[ https://issues.apache.org/jira/browse/HBASE-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243842#comment-13243842 ] Ted Yu commented on HBASE-5665: --- HBASE-5665-trunk.patch looks good. Repeated split causes HRegionServer failures and breaks table -- Key: HBASE-5665 URL: https://issues.apache.org/jira/browse/HBASE-5665 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.92.1 Reporter: Cosmin Lehene Assignee: Cosmin Lehene Priority: Blocker Attachments: HBASE-5665-0.92.patch, HBASE-5665-trunk.patch Repeated splits on large tables (2 consecutive would suffice) will essentially break the table (and the cluster), unrecoverable. The regionserver doing the split dies and the master will get into an infinite loop trying to assign regions that seem to have the files missing from HDFS. The table can be disabled once. upon trying to re-enable it, it will remain in an intermediary state forever. I was able to reproduce this on a smaller table consistently. {code} hbase(main):030:0 (0..1).each{|x| put 't1', #{x}, 'f1:t', 'dd'} hbase(main):030:0 (0..1000).each{|x| split 't1', #{x*10}} {code} Running overlapping splits in parallel (e.g. #{x*10+1}, #{x*10+2}... ) will reproduce the issue almost instantly and consistently. {code} 2012-03-28 10:57:16,320 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Offlined parent region t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1. in META 2012-03-28 10:57:16,321 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for t1,5,1332957435767.648d30de55a5cec6fc2f56dcb3c7eee1.. compaction_queue=(0:1), split_queue=10 2012-03-28 10:57:16,343 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1.; Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124 java.io.IOException: Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124 at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:363) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:451) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: File does not exist: /hbase/t1/589c44cabba419c6ad8c9b427e5894e3.2fb0473f4e71339e88dab0ee0d4dffa1/f1/d62a852c25ad44e09518e102ca557237 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1813) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:341) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1008) at org.apache.hadoop.hbase.io.HalfStoreFileReader.init(HalfStoreFileReader.java:65) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:467) at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548) at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:284) at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2511) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:450) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3229) at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:504) at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:484) ... 1 more 2012-03-28 10:57:16,345 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server ld2,60020,1332957343833: Abort; we got an error after point-of-no-return {code} http://hastebin.com/diqinibajo.avrasm later edit: (I'm using the last 4 characters from each string) Region 94e3 has storefile 7237 Region 94e3 gets splited in daughters a: ffa1 and b: eee1 Daughter region ffa1
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243845#comment-13243845 ] Ted Yu commented on HBASE-5666: --- {code} +if (keeperEx != null) + throw keeperEx; {code} Please either lift the throw to the same line as if or add curly braces. {code} +checkExists(zk, parentZNode, maxTimeMs); +LOG.info(Parent znode exists: + parentZNode); {code} If checkExists() returns -1, would the log statement still be true ? RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243863#comment-13243863 ] Ted Yu commented on HBASE-5666: --- Patch v2 looks good. RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5436) Right-size the map when reading attributes.
[ https://issues.apache.org/jira/browse/HBASE-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243958#comment-13243958 ] Ted Yu commented on HBASE-5436: --- Integrated to 0.92 branch. Right-size the map when reading attributes. --- Key: HBASE-5436 URL: https://issues.apache.org/jira/browse/HBASE-5436 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Benoit Sigoure Assignee: Benoit Sigoure Priority: Trivial Labels: performance Fix For: 0.94.0 Attachments: 0001-Right-size-the-map-when-reading-attributes.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238675#comment-13238675 ] Ted Yu commented on HBASE-5623: --- Last patch should be good to go. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238891#comment-13238891 ] Ted Yu commented on HBASE-2600: --- @Alex: Can you attach hbase-2600-root.dir.tgz to this JIRA ? Please briefly describe how you generated the tar ball. Thanks Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid Key: HBASE-2600 URL: https://issues.apache.org/jira/browse/HBASE-2600 Project: HBase Issue Type: Bug Reporter: stack Assignee: Alex Newman Attachments: 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, jenkins.pdf This is an idea that Ryan and I have been kicking around on and off for a while now. If regionnames were made of tablename+endrow instead of tablename+startrow, then in the metatables, doing a search for the region that contains the wanted row, we'd just have to open a scanner using passed row and the first row found by the scan would be that of the region we need (If offlined parent, we'd have to scan to the next row). If we redid the meta tables in this format, we'd be using an access that is natural to hbase, a scan as opposed to the perverse, expensive getClosestRowBefore we currently have that has to walk backward in meta finding a containing region. This issue is about changing the way we name regions. If we were using scans, prewarming client cache would be near costless (as opposed to what we'll currently have to do which is first a getClosestRowBefore and then a scan from the closestrowbefore forward). Converting to the new method, we'd have to run a migration on startup changing the content in meta. Up to this, the randomid component of a region name has been the timestamp of region creation. HBASE-2531 32-bit encoding of regionnames waaay too susceptible to hash clashes proposes changing the randomid so that it contains actual name of the directory in the filesystem that hosts the region. If we had this in place, I think it would help with the migration to this new way of doing the meta because as is, the region name in fs is a hash of regionname... changing the format of the regionname would mean we generate a different hash... so we'd need hbase-2531 to be in place before we could do this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5641) decayingSampleTick1 prevents HBase from shutting down.
[ https://issues.apache.org/jira/browse/HBASE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239139#comment-13239139 ] Ted Yu commented on HBASE-5641: --- +1 on patch. decayingSampleTick1 prevents HBase from shutting down. -- Key: HBASE-5641 URL: https://issues.apache.org/jira/browse/HBASE-5641 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: 5641.txt I think this is the problem. It creates a non-daemon thread. {code} private static final ScheduledExecutorService TICK_SERVICE = Executors.newScheduledThreadPool(1, Threads.getNamedThreadFactory(decayingSampleTick)); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238009#comment-13238009 ] Ted Yu commented on HBASE-5615: --- Integrated to 0.92, 0.94 and TRUNK as well. the master never does balance because of balancing the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238010#comment-13238010 ] Ted Yu commented on HBASE-2600: --- @Alex: Try this: {code} git diff --no-prefix --binary {code} Thanks Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid Key: HBASE-2600 URL: https://issues.apache.org/jira/browse/HBASE-2600 Project: HBase Issue Type: Bug Reporter: stack Assignee: Alex Newman Attachments: 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, jenkins.pdf This is an idea that Ryan and I have been kicking around on and off for a while now. If regionnames were made of tablename+endrow instead of tablename+startrow, then in the metatables, doing a search for the region that contains the wanted row, we'd just have to open a scanner using passed row and the first row found by the scan would be that of the region we need (If offlined parent, we'd have to scan to the next row). If we redid the meta tables in this format, we'd be using an access that is natural to hbase, a scan as opposed to the perverse, expensive getClosestRowBefore we currently have that has to walk backward in meta finding a containing region. This issue is about changing the way we name regions. If we were using scans, prewarming client cache would be near costless (as opposed to what we'll currently have to do which is first a getClosestRowBefore and then a scan from the closestrowbefore forward). Converting to the new method, we'd have to run a migration on startup changing the content in meta. Up to this, the randomid component of a region name has been the timestamp of region creation. HBASE-2531 32-bit encoding of regionnames waaay too susceptible to hash clashes proposes changing the randomid so that it contains actual name of the directory in the filesystem that hosts the region. If we had this in place, I think it would help with the migration to this new way of doing the meta because as is, the region name in fs is a hash of regionname... changing the format of the regionname would mean we generate a different hash... so we'd need hbase-2531 to be in place before we could do this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238030#comment-13238030 ] Ted Yu commented on HBASE-5615: --- @Xufeng: You're welcome. In the future, please grant license to Apache when you attach patches. the master never does balance because of balancing the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
[ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238074#comment-13238074 ] Ted Yu commented on HBASE-2600: --- dev-support/test-patch.sh doesn't use '--binary' option when applying patches. I tried the following command: {code} patch -p0 --binary -i HBASE-2600+5217-Sun-Mar-25-2012-v4.patch {code} But src/test/data/hbase-2600-root.dir.tgz wasn't unpacked from patch. Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid Key: HBASE-2600 URL: https://issues.apache.org/jira/browse/HBASE-2600 Project: HBase Issue Type: Bug Reporter: stack Assignee: Alex Newman Attachments: 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, jenkins.pdf This is an idea that Ryan and I have been kicking around on and off for a while now. If regionnames were made of tablename+endrow instead of tablename+startrow, then in the metatables, doing a search for the region that contains the wanted row, we'd just have to open a scanner using passed row and the first row found by the scan would be that of the region we need (If offlined parent, we'd have to scan to the next row). If we redid the meta tables in this format, we'd be using an access that is natural to hbase, a scan as opposed to the perverse, expensive getClosestRowBefore we currently have that has to walk backward in meta finding a containing region. This issue is about changing the way we name regions. If we were using scans, prewarming client cache would be near costless (as opposed to what we'll currently have to do which is first a getClosestRowBefore and then a scan from the closestrowbefore forward). Converting to the new method, we'd have to run a migration on startup changing the content in meta. Up to this, the randomid component of a region name has been the timestamp of region creation. HBASE-2531 32-bit encoding of regionnames waaay too susceptible to hash clashes proposes changing the randomid so that it contains actual name of the directory in the filesystem that hosts the region. If we had this in place, I think it would help with the migration to this new way of doing the meta because as is, the region name in fs is a hash of regionname... changing the format of the regionname would mean we generate a different hash... so we'd need hbase-2531 to be in place before we could do this change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237556#comment-13237556 ] Ted Yu commented on HBASE-5615: --- Integrated to 0.90 branch. Thanks for the patch Xufeng. Thanks for the review Ramkrishna and Jinchao. Patch for TRUNK to follow the master never does balance because of balancing the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Fix For: 0.90.7, 0.96.0 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237714#comment-13237714 ] Ted Yu commented on HBASE-5623: --- Adding LOG.debug should be fine. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233120#comment-13233120 ] Ted Yu commented on HBASE-3996: --- Eran might be busy. I created https://reviews.apache.org/r/4411/ for people to review. Support multiple tables and scanners as input to the mapper in map/reduce jobs -- Key: HBASE-3996 URL: https://issues.apache.org/jira/browse/HBASE-3996 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Eran Kutner Assignee: Eran Kutner Fix For: 0.94.0, 0.96.0 Attachments: 3996-v2.txt, 3996-v3.txt, HBase-3996.patch It seems that in many cases feeding data from multiple tables or multiple scanners on a single table can save a lot of time when running map/reduce jobs. I propose a new MultiTableInputFormat class that would allow doing this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233179#comment-13233179 ] Ted Yu commented on HBASE-5564: --- {code} +public int getTimestapKeyColumnIndex() { {code} Please fix typo in the above method name. {code} +-D + TIMESTAMP_CONF_KEY + =currentTimeAsLong - use the specified timestamp for the import. This option is ignored if HBASE_TS_KEY is specfied in 'importtsv.columns'\n + {code} Please wrap the long line above. {code} +// Should never get 0. +ts = conf.getLong(ImportTsv.TIMESTAMP_CONF_KEY, 0); {code} Please explain why 0 wouldn't be returned. {code} + if (parser.getTimestapKeyColumnIndex() != -1) +ts = parsed.getTimestamp(); {code} Please use curly braces around the assignment. Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Attachments: HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5596) Few minor bugs from HBASE-5209
[ https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233216#comment-13233216 ] Ted Yu commented on HBASE-5596: --- Is this patch ready for integration ? From https://builds.apache.org/job/PreCommit-HBASE-Build/1212//testReport/org.apache.hadoop.hbase.client/TestInstantSchemaChangeFailover/testInstantSchemaChangeWhileRSCrash/: {code} Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:200) {code} Few minor bugs from HBASE-5209 -- Key: HBASE-5596 URL: https://issues.apache.org/jira/browse/HBASE-5596 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.94.0 Reporter: David S. Wang Assignee: David S. Wang Priority: Minor Attachments: HBASE_5596.patch A few leftover bugs from HBASE-5209. Comments are documented here: https://reviews.apache.org/r/3892/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()
[ https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227569#comment-13227569 ] Ted Yu commented on HBASE-5542: --- I think we should keep time bound whose default value can be large. Unify HRegion.mutateRowsWithLocks() and HRegion.processRow() Key: HBASE-5542 URL: https://issues.apache.org/jira/browse/HBASE-5542 Project: HBase Issue Type: Improvement Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.96.0 Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.2.patch, HBASE-5542.D2217.3.patch, HBASE-5542.D2217.4.patch, HBASE-5542.D2217.5.patch, HBASE-5542.D2217.6.patch, HBASE-5542.D2217.7.patch mutateRowsWithLocks() does atomic mutations on multiple rows. processRow() does atomic read-modify-writes on a single row. It will be useful to generalize both and have a processRowsWithLocks() that does atomic read-modify-writes on multiple rows. This also helps reduce some redundancy in the codes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227661#comment-13227661 ] Ted Yu commented on HBASE-5206: --- Patch v2 looks good. Minor comments: {code} // Call to undisableTable does this. TODO: Make a more formal purge table. -am.getZKTable().setEnabledTable(Bytes.toString(tableName)); +am.getZKTable().setDeletedTable(Bytes.toString(tableName)); {code} I don't see undisableTable. Can we remove the comment above ? {code} + } else if (!this.zkTable + .isEnabledTable(region.getTableNameAsString())) { +setEnabledTable(region); {code} setEnabledTable(HRegionInfo hri) already calls zkTable.isEnabledTable(). It seems we can call setEnabledTable(region) directly above. Port HBASE-5155 to 0.92 and TRUNK - Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.96.0 Reporter: Zhihong Yu Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227669#comment-13227669 ] Ted Yu commented on HBASE-5206: --- {code} + String errorMsg = Unable to ensure that the table + tableName + + will be + enabled because of a ZooKeeper issue; {code} A space should be added between and will. Port HBASE-5155 to 0.92 and TRUNK - Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.96.0 Reporter: Zhihong Yu Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227688#comment-13227688 ] Ted Yu commented on HBASE-5206: --- The following error is reproducible on MacBook (patch for 0.92): {code} Tests in error: org.apache.hadoop.hbase.TestDrainingServer: org.apache.hadoop.hbase.TableNotEnabledException: t {code} Port HBASE-5155 to 0.92 and TRUNK - Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.96.0 Reporter: Zhihong Yu Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227892#comment-13227892 ] Ted Yu commented on HBASE-4608: --- Introducing WAL_VERSION would imply that we may change HLog aspect other than compression in the future. Is there plan for the above ? Having another compression type is nice but requires making HLogKey persistence pluggable. I think it would be better to introduce one meta entry instead of two. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227901#comment-13227901 ] Ted Yu commented on HBASE-4608: --- bq. try a paragraph of text going in and out LRUDictionary deals with byte array: {code} public short findEntry(byte[] data, int offset, int length) { {code} In this regard, piping text into the dictionary is functionally same as piping byte[] form of integer. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227946#comment-13227946 ] Ted Yu commented on HBASE-4608: --- I think WAL_VERSION metadata is orthogonal to compression type metadata and I would expect both to be present in new HLog files written with this feature. Say we define WAL_VERSION as v2 which has WAL compression capability. We still need to check compression type metadata before applying dictionary compression. In this regard adding WAL_VERSION seems to be redundant. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227954#comment-13227954 ] Ted Yu commented on HBASE-4608: --- bq. Should the Compression class in wal package ... I only see KeyValueCompression.java under wal package. Please elaborate which class should carry more comments. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227961#comment-13227961 ] Ted Yu commented on HBASE-4608: --- Uploaded v23 onto review board. After WAL version metadata design is finalized, will add that. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227984#comment-13227984 ] Ted Yu commented on HBASE-4608: --- For code specific review, please use https://reviews.apache.org/r/4185/ where there would be context. I can add WAL_VERSION as v2 in the metadata. My question is: would HLog v2 be allowed not to compress Log entries ? If desirable, we can discuss in more detail, face to face, on the 27th. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227997#comment-13227997 ] Ted Yu commented on HBASE-5399: --- TestAtomicOperation failed in latest TRUNK build: https://builds.apache.org/job/HBase-TRUNK/2676/testReport/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/ Similar failure shows up in the latest Hadoop QA run of HBASE-5542 Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 5399_inprogress.v9.patch, nochange.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228009#comment-13228009 ] Ted Yu commented on HBASE-5399: --- From test output: {code} Exception in thread Thread-211 junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.fail(Assert.java:56) at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$2.run(TestAtomicOperation.java:392) {code} Here is related code in test: {code} if (r.size() != 1) { LOG.debug(r); failures.incrementAndGet(); fail(); } {code} Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 5399_inprogress.v9.patch, nochange.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5201) Add TThreadedSelectorServer and remove repeat codes in ThriftServer and HRegionThriftServer
[ https://issues.apache.org/jira/browse/HBASE-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186520#comment-13186520 ] Ted Yu commented on HBASE-5201: --- @Scott: I think you need to mark your work complete before seeing the 'Submit Patch' button. After you click 'Submit Patch', future attachments would be picked up by Hadoop QA. Add TThreadedSelectorServer and remove repeat codes in ThriftServer and HRegionThriftServer --- Key: HBASE-5201 URL: https://issues.apache.org/jira/browse/HBASE-5201 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.94.0 Attachments: HBASE-5201-v2.txt, HBASE-5201-v3.txt, HBASE-5201.txt TThreadedSelectorServer is good for RPC-heavy situation because IO are not limited to one CPU. See https://issues.apache.org/jira/browse/Thrift-1167 I am porting the related classes form thrift trunk (it is not there in thrift-0.7.0). There are lots of repeat codes in ThriftServer and HRegionThriftServer. These codes are now moved to a Runnable called ThriftServerRunner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor
[ https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186527#comment-13186527 ] Ted Yu commented on HBASE-5174: --- A slight variation of my previous proposal: MonitoredTaskImpl can maintain MapString, MonitoredTask where String key is the description passed to TaskMonitor.createStatus(), prepended with MonitoredTask.State and separator string (such as '||'). A task may have two entries in the map, one starting with 'ABORTED', the other starting with 'COMPLETE'. This corresponds to task retries. Special handling would be added to MonitoredTaskImpl.setState(). Coalesce aborted tasks in the TaskMonitor - Key: HBASE-5174 URL: https://issues.apache.org/jira/browse/HBASE-5174 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Fix For: 0.94.0, 0.92.1 Some tasks can get repeatedly canceled like flushing when splitting is going on, in the logs it looks like this: {noformat} 2012-01-10 19:28:29,164 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap pressure 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, writesEnabled=false 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=1.6g 2012-01-10 19:28:29,164 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap pressure 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, writesEnabled=false 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=1.6g 2012-01-10 19:28:29,164 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap pressure 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, writesEnabled=false {noformat} But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the regions. Basically 1000x: {noformat} Tue Jan 10 19:28:29 UTC 2012 Flushing test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec ago) Not flushing since writes not enabled (since 31sec ago) {noformat} It's ugly and I'm sure some users will freak out seeing this, plus you have to scroll down all the way to see your regions. Coalescing consecutive aborted tasks seems like a good solution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing
[ https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167581#comment-13167581 ] Ted Yu commented on HBASE-4951: --- +1 on patch. master process can not be stopped when it is initializing - Key: HBASE-4951 URL: https://issues.apache.org/jira/browse/HBASE-4951 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: xufeng Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.6 Attachments: HBASE-4951.patch It is easy to reproduce by following step: step1:start master process.(do not start regionserver process in the cluster). the master will wait the regionserver to check in: org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to checkin step2:stop the master by sh command bin/hbase master stop result:the master process will never die because catalogTracker.waitForRoot() method will block unitl the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5001) Improve the performance of block cache keys
[ https://issues.apache.org/jira/browse/HBASE-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167947#comment-13167947 ] Ted Yu commented on HBASE-5001: --- In LruBlockCache, please change the javadoc for cacheKey parameter. Improve the performance of block cache keys --- Key: HBASE-5001 URL: https://issues.apache.org/jira/browse/HBASE-5001 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Attachments: 5001-v1.txt Doing a pure random read test on data that's 100% block cache, I see that we are spending quite some time in getBlockCacheKey: {quote} IPC Server handler 19 on 62023 daemon prio=10 tid=0x7fe0501ff800 nid=0x6c87 runnable [0x7fe0577f6000] java.lang.Thread.State: RUNNABLE at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at java.lang.StringBuilder.append(StringBuilder.java:119) at org.apache.hadoop.hbase.io.hfile.HFile.getBlockCacheKey(HFile.java:457) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:249) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:209) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:521) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:536) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:178) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:111) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekExactly(StoreFileScanner.java:219) at org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:80) at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1689) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:2857) {quote} Since the HFile name size is known and the offset is a long, it should be possible to allocate exactly what we need. Maybe use byte[] as the key and drop the separator too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4994) TestHeapSize broke in trunk
[ https://issues.apache.org/jira/browse/HBASE-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166405#comment-13166405 ] Ted Yu commented on HBASE-4994: --- Thanks for fixing this. Hadoop Qa didn't report this TestHeapSize broke in trunk --- Key: HBASE-4994 URL: https://issues.apache.org/jira/browse/HBASE-4994 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: heapsize.txt This commit added Map to HRegion {code} commit 888d73a9f5fe907f7c616211322fff339eeaa446 Author: Zhihong Yu te...@apache.org Date: Fri Dec 9 06:01:58 2011 + HBASE-4946 HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (Andrei Dragomir) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before openRegionHandler completes, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165837#comment-13165837 ] Ted Yu commented on HBASE-4880: --- +1 on patch v4. Region is on service before openRegionHandler completes, may cause data loss Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: chunhui shen Assignee: chunhui shen Attachments: 4880.txt, hbase-4880.patch, hbase-4880v2.patch, hbase-4880v3.patch, hbase-4880v4.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d
[ https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165839#comment-13165839 ] Ted Yu commented on HBASE-4946: --- Will commit patch v4 tomorrow. HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to deserialize an unknown class. - Key: HBASE-4946 URL: https://issues.apache.org/jira/browse/HBASE-4946 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Attachments: 4946-v4.txt, HBASE-4946-v2.patch, HBASE-4946-v3.patch, HBASE-4946.patch Loading coprocessors jars from hdfs works fine. I load it from the shell, after setting the attribute, and it gets loaded: {noformat} INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor config now ... INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class com.MyCoprocessorClass needs to be loaded from a file - hdfs://localhost:9000/coproc/rt- 0.0.1-SNAPSHOT.jar. INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: com.MyCoprocessorClass INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: RegionEnvironment createEnvironment DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. protocol=com.MyCoprocessorClassProtocol INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load coprocessor com.MyCoprocessorClass from HTD of t1 successfully. {noformat} The problem is that this coprocessors simply extends BaseEndpointCoprocessor, with a dynamic method. When calling this method from the client with HTable.coprocessorExec, I get errors on the HRegionServer, because the call cannot be deserialized from writables. The problem is that Exec tries to do an early resolve of the coprocessor class. The coprocessor class is loaded, but it is in the context of the HRegionServer / HRegion. So, the call fails: {noformat} 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Error in readFields java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943) at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122) ... 10 more {noformat} Probably the correct way to fix this is to make Exec really smart, so that it knows all the class definitions loaded in CoprocessorHost(s). I created a small patch that simply doesn't resolve the class definition in the Exec, instead passing it as string down to the HRegion layer. This layer knows all the definitions, and simply loads it by name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165857#comment-13165857 ] Ted Yu commented on HBASE-4120: --- Please add license to QosRegionObserver.java I think we should enhance ScannerListener.leaseExpired() with pre/postScannerClose() so that features such as table priority can receive consistent notification. The trick here is that RegionScanner only exposes HRegionInfo. We should be able to utilize onlineRegions in looking up HRegion by region name. Then we should be able to call the following: {code} if (region != null region.getCoprocessorHost() != null) { region.getCoprocessorHost().postScannerClose(s); } {code} isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d
[ https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165870#comment-13165870 ] Ted Yu commented on HBASE-4946: --- Integrated to 0.92 and TRUNK. Thanks for the patch Andrei. Thanks for the review Lars and Stack. HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to deserialize an unknown class. - Key: HBASE-4946 URL: https://issues.apache.org/jira/browse/HBASE-4946 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Fix For: 0.92.0, 0.94.0 Attachments: 4946-v4.txt, 4946-v5.txt, HBASE-4946-v2.patch, HBASE-4946-v3.patch, HBASE-4946.patch Loading coprocessors jars from hdfs works fine. I load it from the shell, after setting the attribute, and it gets loaded: {noformat} INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor config now ... INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class com.MyCoprocessorClass needs to be loaded from a file - hdfs://localhost:9000/coproc/rt- 0.0.1-SNAPSHOT.jar. INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: com.MyCoprocessorClass INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: RegionEnvironment createEnvironment DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. protocol=com.MyCoprocessorClassProtocol INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load coprocessor com.MyCoprocessorClass from HTD of t1 successfully. {noformat} The problem is that this coprocessors simply extends BaseEndpointCoprocessor, with a dynamic method. When calling this method from the client with HTable.coprocessorExec, I get errors on the HRegionServer, because the call cannot be deserialized from writables. The problem is that Exec tries to do an early resolve of the coprocessor class. The coprocessor class is loaded, but it is in the context of the HRegionServer / HRegion. So, the call fails: {noformat} 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Error in readFields java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943) at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122) ... 10 more {noformat} Probably the correct way to fix this is to make Exec really smart, so that it knows all the class definitions loaded in CoprocessorHost(s). I created a small patch that simply doesn't resolve the class definition in the Exec, instead passing it as string down to the HRegion layer. This layer knows all the definitions, and simply loads it by name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165876#comment-13165876 ] Ted Yu commented on HBASE-4120: --- In PriorityFunction.java, several lines are much wider than 80 characters. Please use the formatter from HBASE-3678 on the new files. isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4970) Add a parameter to change keepAliveTime of Htable thread pool.
[ https://issues.apache.org/jira/browse/HBASE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164274#comment-13164274 ] Ted Yu commented on HBASE-4970: --- I think there shouldn't be upper case letters in name of new config. Add a parameter to change keepAliveTime of Htable thread pool. --- Key: HBASE-4970 URL: https://issues.apache.org/jira/browse/HBASE-4970 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Priority: Trivial Fix For: 0.90.5 Attachments: HBASE-4970_Branch90.patch In my cluster, I changed keepAliveTime from 60 s to 3600 s. Increasing RES is slowed down. Why increasing keepAliveTime of HBase thread pool is slowing down our problem occurance [RES value increase]? You can go through the source of sun.nio.ch.Util. Every thread hold 3 softreference of direct buffer(mustangsrc) for reusage. The code names the 3 softreferences buffercache. If the buffer was all occupied or none was suitable in size, and new request comes, new direct buffer is allocated. After the service, the bigger one replaces the smaller one in buffercache. The replaced buffer is released. So I think we can add a parameter to change keepAliveTime of Htable thread pool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4224) Need a flush by regionserver rather than by table option
[ https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164599#comment-13164599 ] Ted Yu commented on HBASE-4224: --- @Akash: Do you have a newer patch ? If so, please upload to this JIRA. Need a flush by regionserver rather than by table option Key: HBASE-4224 URL: https://issues.apache.org/jira/browse/HBASE-4224 Project: HBase Issue Type: Bug Components: shell Reporter: stack Assignee: Akash Ashok Attachments: HBase-4224.patch This evening needed to clean out logs on the cluster. logs are by regionserver. to let go of logs, we need to have all edits emptied from memory. only flush is by table or region. We need to be able to flush the regionserver. Need to add this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4729) Clash between region unassign and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164658#comment-13164658 ] Ted Yu commented on HBASE-4729: --- The HadoopQA report @ https://builds.apache.org/job/PreCommit-HBASE-Build/405//testReport/ showed basically no tests were run. A manual test suite execution should have been performed. Clash between region unassign and splitting kills the master Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: stack Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, 4729-v6-092.txt, 4729-v6-trunk.txt, 4729.txt I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164695#comment-13164695 ] Ted Yu commented on HBASE-4927: --- Ran through the previously failing tests: {code} 1010 mt -Dtest=TestMasterRestartAfterDisablingTable 1012 mt -Dtest=TestOfflineMetaRebuildBase#testMetaRebuild 1013 mt -Dtest=TestOfflineMetaRebuildHole {code} They pass now. Going to commit to 0.92 and TRUNK. CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.92.0 Attachments: 0001-Fixed-TestOffline-failure-caused-by-HBASE-4927.patch, 0001-Fixed-TestOffline-failure-caused-by-HBASE-4927.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, hbase-4927-fix-ws.txt When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164699#comment-13164699 ] Ted Yu commented on HBASE-4927: --- Also ran through the two tests in original patch: {code} 1302 mt -Dtest=TestHRegionInfo 1303 mt -Dtest=TestCatalogJanitor {code} They passed as well. Integrated to 0.92 and TRUNK. Thanks for the addendum, Jimmy. Thanks for the help, Jonathan. CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.92.0 Attachments: 0001-Fixed-TestOffline-failure-caused-by-HBASE-4927.patch, 0001-Fixed-TestOffline-failure-caused-by-HBASE-4927.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, hbase-4927-fix-ws.txt When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164960#comment-13164960 ] Ted Yu commented on HBASE-4120: --- But preScannerClose() is only passed one scanner. isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165017#comment-13165017 ] Ted Yu commented on HBASE-4120: --- Looks like we should expose scanner lease expiration through new coprocessor API. isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4980) Null pointer exception in HBaseClient receiveResponse
[ https://issues.apache.org/jira/browse/HBASE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165021#comment-13165021 ] Ted Yu commented on HBASE-4980: --- The patch changes the semantics of original error handling. I think we shouldn't leave data unread in the input stream. Null pointer exception in HBaseClient receiveResponse - Key: HBASE-4980 URL: https://issues.apache.org/jira/browse/HBASE-4980 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Shrijeet Paliwal Labels: newbie Attachments: 0001-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch Relevant Stack trace: 2011-11-30 13:10:26,557 [IPC Client (47) connection to xx.xx.xx/172.22.4.68:60020 from an unknown user] WARN org.apache.hadoop.ipc.HBaseClient - Unexpected exception receiving call responses java.lang.NullPointerException -at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:583) -at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:511) {code} if (LOG.isDebugEnabled()) LOG.debug(getName() + got value # + id); Call call = calls.remove(id); // Read the flag byte byte flag = in.readByte(); boolean isError = ResponseFlag.isError(flag); if (ResponseFlag.isLength(flag)) { // Currently length if present is unused. in.readInt(); } int state = in.readInt(); // Read the state. Currently unused. if (isError) { //noinspection ThrowableInstanceNeverThrown call.setException(new RemoteException( WritableUtils.readString(in), WritableUtils.readString(in))); } else { {code} This line {code}Call call = calls.remove(id);{code} may return a null 'call'. It is so because if you have rpc timeout enable, we proactively clean up other calls which have expired their lifetime along with the call for which socket timeout exception happend. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4980) Null pointer exception in HBaseClient receiveResponse
[ https://issues.apache.org/jira/browse/HBASE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165025#comment-13165025 ] Ted Yu commented on HBASE-4980: --- Please attach patch for trunk last so that HadoopQA can test it. Null pointer exception in HBaseClient receiveResponse - Key: HBASE-4980 URL: https://issues.apache.org/jira/browse/HBASE-4980 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Shrijeet Paliwal Labels: newbie Attachments: 0001-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch, 0002-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch Relevant Stack trace: 2011-11-30 13:10:26,557 [IPC Client (47) connection to xx.xx.xx/172.22.4.68:60020 from an unknown user] WARN org.apache.hadoop.ipc.HBaseClient - Unexpected exception receiving call responses java.lang.NullPointerException -at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:583) -at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:511) {code} if (LOG.isDebugEnabled()) LOG.debug(getName() + got value # + id); Call call = calls.remove(id); // Read the flag byte byte flag = in.readByte(); boolean isError = ResponseFlag.isError(flag); if (ResponseFlag.isLength(flag)) { // Currently length if present is unused. in.readInt(); } int state = in.readInt(); // Read the state. Currently unused. if (isError) { //noinspection ThrowableInstanceNeverThrown call.setException(new RemoteException( WritableUtils.readString(in), WritableUtils.readString(in))); } else { {code} This line {code}Call call = calls.remove(id);{code} may return a null 'call'. It is so because if you have rpc timeout enable, we proactively clean up other calls which have expired their lifetime along with the call for which socket timeout exception happend. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4980) Null pointer exception in HBaseClient receiveResponse
[ https://issues.apache.org/jira/browse/HBASE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165031#comment-13165031 ] Ted Yu commented on HBASE-4980: --- Please use --no-prefix to generate patch. Null pointer exception in HBaseClient receiveResponse - Key: HBASE-4980 URL: https://issues.apache.org/jira/browse/HBASE-4980 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Shrijeet Paliwal Labels: newbie Attachments: 0001-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch, 0002-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch Relevant Stack trace: 2011-11-30 13:10:26,557 [IPC Client (47) connection to xx.xx.xx/172.22.4.68:60020 from an unknown user] WARN org.apache.hadoop.ipc.HBaseClient - Unexpected exception receiving call responses java.lang.NullPointerException -at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:583) -at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:511) {code} if (LOG.isDebugEnabled()) LOG.debug(getName() + got value # + id); Call call = calls.remove(id); // Read the flag byte byte flag = in.readByte(); boolean isError = ResponseFlag.isError(flag); if (ResponseFlag.isLength(flag)) { // Currently length if present is unused. in.readInt(); } int state = in.readInt(); // Read the state. Currently unused. if (isError) { //noinspection ThrowableInstanceNeverThrown call.setException(new RemoteException( WritableUtils.readString(in), WritableUtils.readString(in))); } else { {code} This line {code}Call call = calls.remove(id);{code} may return a null 'call'. It is so because if you have rpc timeout enable, we proactively clean up other calls which have expired their lifetime along with the call for which socket timeout exception happend. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4682) Support deleted rows using Import/Export
[ https://issues.apache.org/jira/browse/HBASE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165048#comment-13165048 ] Ted Yu commented on HBASE-4682: --- In the new Delete ctor, row key should be included in the IOE. Support deleted rows using Import/Export Key: HBASE-4682 URL: https://issues.apache.org/jira/browse/HBASE-4682 Project: HBase Issue Type: Sub-task Components: mapreduce Affects Versions: 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 4682-v1.txt Parent allows keeping deleted rows around. Would be nice if those could be exported and imported as well. All the building blocks are there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d
[ https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163819#comment-13163819 ] Ted Yu commented on HBASE-4946: --- The failed tests were due to 'unable to create new native thread' HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to deserialize an unknown class. - Key: HBASE-4946 URL: https://issues.apache.org/jira/browse/HBASE-4946 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Attachments: HBASE-4946-v2.patch, HBASE-4946-v3.patch, HBASE-4946.patch Loading coprocessors jars from hdfs works fine. I load it from the shell, after setting the attribute, and it gets loaded: {noformat} INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor config now ... INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class com.MyCoprocessorClass needs to be loaded from a file - hdfs://localhost:9000/coproc/rt- 0.0.1-SNAPSHOT.jar. INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: com.MyCoprocessorClass INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: RegionEnvironment createEnvironment DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. protocol=com.MyCoprocessorClassProtocol INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load coprocessor com.MyCoprocessorClass from HTD of t1 successfully. {noformat} The problem is that this coprocessors simply extends BaseEndpointCoprocessor, with a dynamic method. When calling this method from the client with HTable.coprocessorExec, I get errors on the HRegionServer, because the call cannot be deserialized from writables. The problem is that Exec tries to do an early resolve of the coprocessor class. The coprocessor class is loaded, but it is in the context of the HRegionServer / HRegion. So, the call fails: {noformat} 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Error in readFields java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943) at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122) ... 10 more {noformat} Probably the correct way to fix this is to make Exec really smart, so that it knows all the class definitions loaded in CoprocessorHost(s). I created a small patch that simply doesn't resolve the class definition in the Exec, instead passing it as string down to the HRegion layer. This layer knows all the definitions, and simply loads it by name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d
[ https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163840#comment-13163840 ] Ted Yu commented on HBASE-4946: --- For coprocessor/Exec.java, javadoc doesn't match code: {code} try { protocol = (ClassCoprocessorProtocol)conf.getClassByName(protocolName); } catch (ClassNotFoundException cnfe) { - throw new IOException(Protocol class +protocolName+ not found, cnfe); + // can't do eager instantiation. pass it as a string and try to deserialize later. + //throw new IOException(Protocol class +protocolName+ not found, cnfe); {code} I think the above try block should be commented out. TestCoprocessorEndpoint passes without the above assignment to protocol. HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to deserialize an unknown class. - Key: HBASE-4946 URL: https://issues.apache.org/jira/browse/HBASE-4946 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Attachments: HBASE-4946-v2.patch, HBASE-4946-v3.patch, HBASE-4946.patch Loading coprocessors jars from hdfs works fine. I load it from the shell, after setting the attribute, and it gets loaded: {noformat} INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor config now ... INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class com.MyCoprocessorClass needs to be loaded from a file - hdfs://localhost:9000/coproc/rt- 0.0.1-SNAPSHOT.jar. INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: com.MyCoprocessorClass INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: RegionEnvironment createEnvironment DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. protocol=com.MyCoprocessorClassProtocol INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load coprocessor com.MyCoprocessorClass from HTD of t1 successfully. {noformat} The problem is that this coprocessors simply extends BaseEndpointCoprocessor, with a dynamic method. When calling this method from the client with HTable.coprocessorExec, I get errors on the HRegionServer, because the call cannot be deserialized from writables. The problem is that Exec tries to do an early resolve of the coprocessor class. The coprocessor class is loaded, but it is in the context of the HRegionServer / HRegion. So, the call fails: {noformat} 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Error in readFields java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943) at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122) ... 10 more {noformat} Probably the correct way to fix this is to make Exec really smart, so that it
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163853#comment-13163853 ] Ted Yu commented on HBASE-4120: --- From Andrew Purtell (see 'trip report for Hadoop In China' up on dev@): After 0.92 is out I intend to champion / mentor / co-develop 4120 and the follow on table allocation work and target 0.94 for it. I think the RPC QoS aspect is not too controversial. The allocation/reservation aspects I'd like to aim for a coprocessor or at least master plugin based integration so they won't impact stability for users who don't enable it. Unlike RPC QoS I suspect the changes needed to core can be minimized to coprocessor framework additions. Follow up in new JIRAs soon. isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163902#comment-13163902 ] Ted Yu commented on HBASE-4120: --- Andy suggested placing the PriorityFunction.initRegionPriority(region) call in RegionObserver.postOpen() isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4605) Constraints
[ https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163939#comment-13163939 ] Ted Yu commented on HBASE-4605: --- Renaming IntegrationTestConstraint as described above makes sense. Thanks Jesse. Constraints --- Key: HBASE-4605 URL: https://issues.apache.org/jira/browse/HBASE-4605 Project: HBase Issue Type: Improvement Components: client, coprocessors Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: 4605.v7, constraint_as_cp.txt, java_Constraint_v2.patch, java_HBASE-4605_v1.patch, java_HBASE-4605_v2.patch, java_HBASE-4605_v3.patch From Jesse's comment on dev: {quote} What I would like to propose is a simple interface that people can use to implement a 'constraint' (matching the classic database definition). This would help ease of adoption by helping HBase more easily check that box, help minimize code duplication across organizations, and lead to easier adoption. Essentially, people would implement a 'Constraint' interface for checking keys before they are put into a table. Puts that are valid get written to the table, but if not people can will throw an exception that gets propagated back to the client explaining why the put was invalid. Constraints would be set on a per-table basis and the user would be expected to ensure the jars containing the constraint are present on the machines serving that table. Yes, people could roll their own mechanism for doing this via coprocessors each time, but this would make it easier to do so, so you only have to implement a very minimal interface and not worry about the specifics. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164104#comment-13164104 ] Ted Yu commented on HBASE-4880: --- Did RestAdmin pass for patch v2? Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164168#comment-13164168 ] Ted Yu commented on HBASE-4880: --- Please check failed tests. It seems trunk is broken now. Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch, hbase-4880v3.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d
[ https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162740#comment-13162740 ] Ted Yu commented on HBASE-4946: --- The change in BloomFilterFactory is unrelated, please remove it from patch. HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to deserialize an unknown class. - Key: HBASE-4946 URL: https://issues.apache.org/jira/browse/HBASE-4946 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Attachments: HBASE-4946-v2.patch, HBASE-4946.patch Loading coprocessors jars from hdfs works fine. I load it from the shell, after setting the attribute, and it gets loaded: {noformat} INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor config now ... INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class com.MyCoprocessorClass needs to be loaded from a file - hdfs://localhost:9000/coproc/rt- 0.0.1-SNAPSHOT.jar. INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: com.MyCoprocessorClass INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: RegionEnvironment createEnvironment DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. protocol=com.MyCoprocessorClassProtocol INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load coprocessor com.MyCoprocessorClass from HTD of t1 successfully. {noformat} The problem is that this coprocessors simply extends BaseEndpointCoprocessor, with a dynamic method. When calling this method from the client with HTable.coprocessorExec, I get errors on the HRegionServer, because the call cannot be deserialized from writables. The problem is that Exec tries to do an early resolve of the coprocessor class. The coprocessor class is loaded, but it is in the context of the HRegionServer / HRegion. So, the call fails: {noformat} 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Error in readFields java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943) at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122) ... 10 more {noformat} Probably the correct way to fix this is to make Exec really smart, so that it knows all the class definitions loaded in CoprocessorHost(s). I created a small patch that simply doesn't resolve the class definition in the Exec, instead passing it as string down to the HRegion layer. This layer knows all the definitions, and simply loads it by name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (HBASE-4954) IllegalArgumentException in hfile2 blockseek
[ https://issues.apache.org/jira/browse/HBASE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163010#comment-13163010 ] Ted Yu commented on HBASE-4954: --- In our internal Jenkins build which uses hadoop 0.22, there was no such exception. I last merged from Apache 0.92 on Nov. 14th So this test failure might have been introduced by JIRAs integrated after Nov. 14th - possibly the backport of HBASE-2856. IllegalArgumentException in hfile2 blockseek Key: HBASE-4954 URL: https://issues.apache.org/jira/browse/HBASE-4954 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Fix For: 0.92.0 On Tue, Nov 29, 2011 at 10:20 PM, Stack st...@duboce.net wrote: The first hbase 0.92.0 release candidate is available for download: http://people.apache.org/~stack/hbase-0.92.0-candidate-0/ Here's another persistent issues that I'd appreciate somebody taking a quick look at: http://bigtop01.cloudera.org:8080/view/Hadoop%200.22/job/Bigtop-hadoop22-smoketest/28/testReport/org.apache.bigtop.itest.hbase.smoke/TestHFileOutputFormat/testMRIncrementalLoadWithSplit/ Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:218) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:632) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:545) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:511) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:475) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:157) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.copyHFileHalf(LoadIncrementalHFiles.java:544) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:516) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:377) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:441) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:325) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4954) IllegalArgumentException in hfile2 blockseek
[ https://issues.apache.org/jira/browse/HBASE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163208#comment-13163208 ] Ted Yu commented on HBASE-4954: --- It turns out that we should be using the following command: {code} 1180 mvn clean -Dhadoop.profile=22 compile 1181 mvn -Dhadoop.profile=22 -P localTests test -Dtest=TestHFileOutputFormat#testMRIncrementalLoadWithSplit {code} where I saw (under TRUNK): {code} testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat) Time elapsed: 18.591 sec FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.runIncrementalPELoad(TestHFileOutputFormat.java:479) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.doIncrementalLoadTest(TestHFileOutputFormat.java:391) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testMRIncrementalLoadWithSplit(TestHFileOutputFormat.java:370) {code} IllegalArgumentException in hfile2 blockseek Key: HBASE-4954 URL: https://issues.apache.org/jira/browse/HBASE-4954 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Fix For: 0.92.0 On Tue, Nov 29, 2011 at 10:20 PM, Stack st...@duboce.net wrote: The first hbase 0.92.0 release candidate is available for download: http://people.apache.org/~stack/hbase-0.92.0-candidate-0/ Here's another persistent issues that I'd appreciate somebody taking a quick look at: http://bigtop01.cloudera.org:8080/view/Hadoop%200.22/job/Bigtop-hadoop22-smoketest/28/testReport/org.apache.bigtop.itest.hbase.smoke/TestHFileOutputFormat/testMRIncrementalLoadWithSplit/ Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:218) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:632) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:545) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:511) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:475) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:157) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.copyHFileHalf(LoadIncrementalHFiles.java:544) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:516) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:377) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:441) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:325) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4954) IllegalArgumentException in hfile2 blockseek
[ https://issues.apache.org/jira/browse/HBASE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163212#comment-13163212 ] Ted Yu commented on HBASE-4954: --- Stack trace for 0.92 is essentially the same. But I don't see test output: {code} -rw-r--r-- 1 zhihyu 110088321 0 Dec 5 16:21 target/surefire-reports/org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat-output.txt {code} IllegalArgumentException in hfile2 blockseek Key: HBASE-4954 URL: https://issues.apache.org/jira/browse/HBASE-4954 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Fix For: 0.92.0 On Tue, Nov 29, 2011 at 10:20 PM, Stack st...@duboce.net wrote: The first hbase 0.92.0 release candidate is available for download: http://people.apache.org/~stack/hbase-0.92.0-candidate-0/ Here's another persistent issues that I'd appreciate somebody taking a quick look at: http://bigtop01.cloudera.org:8080/view/Hadoop%200.22/job/Bigtop-hadoop22-smoketest/28/testReport/org.apache.bigtop.itest.hbase.smoke/TestHFileOutputFormat/testMRIncrementalLoadWithSplit/ Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:218) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:632) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:545) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:511) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:475) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:157) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.copyHFileHalf(LoadIncrementalHFiles.java:544) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:516) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:377) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:441) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:325) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4954) IllegalArgumentException in hfile2 blockseek
[ https://issues.apache.org/jira/browse/HBASE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163380#comment-13163380 ] Ted Yu commented on HBASE-4954: --- Can I mark this issue as Won't fix ? IllegalArgumentException in hfile2 blockseek Key: HBASE-4954 URL: https://issues.apache.org/jira/browse/HBASE-4954 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Fix For: 0.92.0 On Tue, Nov 29, 2011 at 10:20 PM, Stack st...@duboce.net wrote: The first hbase 0.92.0 release candidate is available for download: http://people.apache.org/~stack/hbase-0.92.0-candidate-0/ Here's another persistent issues that I'd appreciate somebody taking a quick look at: http://bigtop01.cloudera.org:8080/view/Hadoop%200.22/job/Bigtop-hadoop22-smoketest/28/testReport/org.apache.bigtop.itest.hbase.smoke/TestHFileOutputFormat/testMRIncrementalLoadWithSplit/ Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:218) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:632) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:545) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:511) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:475) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:157) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.copyHFileHalf(LoadIncrementalHFiles.java:544) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:516) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:377) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:441) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:325) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4605) Constraints
[ https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162565#comment-13162565 ] Ted Yu commented on HBASE-4605: --- Thanks Gary for the detailed review. For the check() API: {code} public void check(Put p) throws ConstraintException; {code} We can abstract the input parameter as Mutation. I think we need to reach agreement on the following major issues: 1. introduction of dependency of Guava in client library - Jonathan Gray would strongly disagree with this practice. 2. whether IntegerConstraint should be included in the constraint package 3. constraint configuration serialization method If the current implementation for some of the above isn't critical to this feature (#1 and #2), we can defer to future JIRAs. Actually a fourth issue (raised by Suraj under the discussion of 'Question on Coprocessors and Atomicity') is more important: if we don't provide atomicity by holding rowlock during the check, the use cases for Constraints would decrease. Issue #4 definitely doesn't have to be covered by this JIRA. The fifth issue is how IntegrationTestConstraint.java should be tagged. Constraints --- Key: HBASE-4605 URL: https://issues.apache.org/jira/browse/HBASE-4605 Project: HBase Issue Type: Improvement Components: client, coprocessors Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: 4605.v7, constraint_as_cp.txt, java_Constraint_v2.patch, java_HBASE-4605_v1.patch, java_HBASE-4605_v2.patch, java_HBASE-4605_v3.patch From Jesse's comment on dev: {quote} What I would like to propose is a simple interface that people can use to implement a 'constraint' (matching the classic database definition). This would help ease of adoption by helping HBase more easily check that box, help minimize code duplication across organizations, and lead to easier adoption. Essentially, people would implement a 'Constraint' interface for checking keys before they are put into a table. Puts that are valid get written to the table, but if not people can will throw an exception that gets propagated back to the client explaining why the put was invalid. Constraints would be set on a per-table basis and the user would be expected to ensure the jars containing the constraint are present on the machines serving that table. Yes, people could roll their own mechanism for doing this via coprocessors each time, but this would make it easier to do so, so you only have to implement a very minimal interface and not worry about the specifics. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4921) HTable initialization looks for EMPTY_START_ROW
[ https://issues.apache.org/jira/browse/HBASE-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162215#comment-13162215 ] Ted Yu commented on HBASE-4921: --- @Pritam: Can you provide a patch and let us know the result of running through test suite ? Thanks HTable initialization looks for EMPTY_START_ROW --- Key: HBASE-4921 URL: https://issues.apache.org/jira/browse/HBASE-4921 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Pritam Damania The HTable initialization does something like this : {code}this.connection.locateRegion(tableName, HConstants.EMPTY_START_ROW);{code} What is the rationale behind this ? What would happen if this region is in flight ? I ran into a problem where I disabled the first region of the table and now I can't create an HTable instance to this table. Disabling the first region is like disabling the entire table from a client perspective. I feel this is not the correct behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4379) [hbck] Does not complain about tables with no end region [Z,]
[ https://issues.apache.org/jira/browse/HBASE-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162224#comment-13162224 ] Ted Yu commented on HBASE-4379: --- @Jonathan: You need to re-attach patch because HadoopQA checks the attachment Id. If the attachment Id corresponds to a patch which was verified earlier, there would be no test suite execution. Thanks [hbck] Does not complain about tables with no end region [Z,] - Key: HBASE-4379 URL: https://issues.apache.org/jira/browse/HBASE-4379 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.92.0, 0.90.5 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4379-hbck-does-not-complain-about-tables-with-.patch, hbase-4379.v2.patch hbck does not detect or have an error condition when the last region of a table is missing (end key != ''). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4942) HMaster is unable to start of HFile V1 is used
[ https://issues.apache.org/jira/browse/HBASE-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162270#comment-13162270 ] Ted Yu commented on HBASE-4942: --- You're right Lars. HMaster is unable to start of HFile V1 is used -- Key: HBASE-4942 URL: https://issues.apache.org/jira/browse/HBASE-4942 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: honghua zhu Fix For: 0.92.0, 0.94.0 This was reported by HH Zhu (zhh200...@gmail.com) If the following is specified in hbase-site.xml: {code} property namehfile.format.version/name value1/value /property {code} Clear the hdfs directory hbase.rootdir so that MasterFileSystem.bootstrap() is executed. You would see: {code} java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.HFileReaderV1.close(HFileReaderV1.java:358) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1083) at org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:570) at org.apache.hadoop.hbase.regionserver.Store.close(Store.java:441) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:782) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:717) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:688) at org.apache.hadoop.hbase.master.MasterFileSystem.bootstrap(MasterFileSystem.java:390) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:356) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314) at java.lang.Thread.run(Thread.java:619) {code} The above exception would lead to: {code} java.lang.RuntimeException: HMaster Aborted at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:152) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1512) {code} In org.apache.hadoop.hbase.master.HMaster.HMaster(Configuration conf), we have: {code} this.conf.setFloat(CacheConfig.HFILE_BLOCK_CACHE_SIZE_KEY, 0.0f); {code} When CacheConfig is instantiated, the following is called: {code} org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(Configuration conf) {code} Since hfile.block.cache.size is 0.0, instantiateBlockCache() would return null, resulting in blockCache field of CacheConfig to be null. When master closes Root region, org.apache.hadoop.hbase.io.hfile.HFileReaderV1.close(boolean evictOnClose) would be called. cacheConf.getBlockCache() returns null, leading to master abort. The following should be called in HFileReaderV1.close(), similar to the code in HFileReaderV2.close(): {code} if (evictOnClose cacheConf.isBlockCacheEnabled()) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4936) Cached HRegionInterface connections crash when getting UnknownHost exceptions
[ https://issues.apache.org/jira/browse/HBASE-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162272#comment-13162272 ] Ted Yu commented on HBASE-4936: --- +1 on patch. @Andrei: Please re-attach patch using --no-prefix option. HadoopQA uses -p0 to apply patches. Cached HRegionInterface connections crash when getting UnknownHost exceptions - Key: HBASE-4936 URL: https://issues.apache.org/jira/browse/HBASE-4936 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Andrei Dragomir Attachments: HBASE-4936.patch This isssue is unlikely to come up in a cluster test case. However, for development, the following thing happens: 1. Start the HBase cluster locally, on network A (DNS A, etc) 2. The region locations are cached using the hostname (mycomputer.company.com, 211.x.y.z - real ip) 3. Change network location (go home) 4. Start the HBase cluster locally. My hostname / ips are not different (mycomputer, 192.168.0.130 - new ip) If the region locations have been cached using the hostname, there is an UnknownHostException in CatalogTracker.getCachedConnection(ServerName sn), uncaught in the catch statements. The server will crash constantly. The error should be caught and not rethrown, so that the cached connection expires normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4945) NPE in HRegion.bulkLoadHFiles(...)
[ https://issues.apache.org/jira/browse/HBASE-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162275#comment-13162275 ] Ted Yu commented on HBASE-4945: --- Good catch, Lars. I think the fix should go to 0.92 NPE in HRegion.bulkLoadHFiles(...) -- Key: HBASE-4945 URL: https://issues.apache.org/jira/browse/HBASE-4945 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.94.0 Reporter: Lars Hofhansl Priority: Minor Was playing with completebulkload, and ran into an NPE. The problem is here (HRegion.bulkLoadHFiles(...)). {code} Store store = getStore(familyName); if (store == null) { IOException ioe = new DoNotRetryIOException( No such column family + Bytes.toStringBinary(familyName)); ioes.add(ioe); failures.add(p); } try { store.assertBulkLoadHFileOk(new Path(path)); } catch (WrongRegionException wre) { // recoverable (file doesn't fit in region) failures.add(p); } catch (IOException ioe) { // unrecoverable (hdfs problem) ioes.add(ioe); } {code} This should be {code} Store store = getStore(familyName); if (store == null) { ... } else { try { store.assertBulkLoadHFileOk(new Path(path)); ... } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162278#comment-13162278 ] Ted Yu commented on HBASE-4880: --- I think the patch makes sense. I ran TestHCM with patch applied. It passed. Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles
[ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162280#comment-13162280 ] Ted Yu commented on HBASE-4944: --- Minor comments: {code} +KeyValue pkv = null; {code} The variable can be named prevKV which is clearer. {code} + throw new InvalidHFileException(Previous row is greater then {code} Typo above, should be 'greater than'. Optionally verify bulk loaded HFiles Key: HBASE-4944 URL: https://issues.apache.org/jira/browse/HBASE-4944 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0, 0.94.0, 0.90.5 Reporter: Andrew Purtell Priority: Minor Attachments: 4944.txt We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness. Patch is against trunk but can apply against all active branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles
[ https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162281#comment-13162281 ] Ted Yu commented on HBASE-4944: --- Looks like the patch should be rebased: {code} 4 out of 5 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/Store.java.rej {code} Optionally verify bulk loaded HFiles Key: HBASE-4944 URL: https://issues.apache.org/jira/browse/HBASE-4944 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0, 0.94.0, 0.90.5 Reporter: Andrew Purtell Priority: Minor Attachments: 4944.txt We rely on users to produce properly formatted HFiles for bulk import. Attached patch adds an optional code path, toggled by a configuration property, that verifies the HFile under consideration for import is properly sorted. The default maintains the current behavior, which does not scan the file for correctness. Patch is against trunk but can apply against all active branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced
[ https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160233#comment-13160233 ] Ted Yu commented on HBASE-3373: --- @Ben: Thanks for trying out 0.94 The code snippet above deals with region server which recently joined the cluster. Its goal is to avoid hot region server which receives above average load. This is part of the changes from HBASE-3609. The randomization is done on this line: {code} Collections.shuffle(sns, RANDOM); {code} where we schedule regions to region servers which are shuffled randomly. Your observation about unbalanced table(s) in the cluster is valid. This is due to master not passing per-table region distribution to balanceCluster(). I have a patch which is in internal repository where master calls balanceCluster() for each table. Once we test it in production cluster, I should be able to contribute back. Allow regions of specific table to be load-balanced --- Key: HBASE-3373 URL: https://issues.apache.org/jira/browse/HBASE-3373 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.20.6 Reporter: Ted Yu Attachments: HbaseBalancerTest2.java From our experience, cluster can be well balanced and yet, one table's regions may be badly concentrated on few region servers. For example, one table has 839 regions (380 regions at time of table creation) out of which 202 are on one server. It would be desirable for load balancer to distribute regions for specified tables evenly across the cluster. Each of such tables has number of regions many times the cluster size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159180#comment-13159180 ] Ted Yu commented on HBASE-4729: --- For the unexpected zk exception, we should abort the master instead of logging error. Race between online altering and splitting kills the master --- Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729.txt I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159410#comment-13159410 ] Ted Yu commented on HBASE-4729: --- Since TestAssignmentManager would be run in a shared JVM under TRUNK in the near future, I think it should be a medium test. From N: - Small tests are executed in a shared JVM. We put in this category all the tests that can be executed quickly (the maximum execution time for a test is 15 seconds, and they do not use a cluster) in a shared jvm. Race between online altering and splitting kills the master --- Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, 4729.txt I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4729) Clash between region unassign and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159426#comment-13159426 ] Ted Yu commented on HBASE-4729: --- +1 on patch v5. Clash between region unassign and splitting kills the master Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, 4729.txt I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4893) HConnectionImplementation closed-but-not-deleted, need a way to find the state of connection
[ https://issues.apache.org/jira/browse/HBASE-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159613#comment-13159613 ] Ted Yu commented on HBASE-4893: --- HConnectionManager.deleteStaleConnection() can be utilized in the above proposal. HConnectionImplementation closed-but-not-deleted, need a way to find the state of connection Key: HBASE-4893 URL: https://issues.apache.org/jira/browse/HBASE-4893 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.1, 0.90.2, 0.90.3, 0.90.4 Environment: Linux 2.6, HBase-0.90.1 Reporter: Mubarak Seyed Labels: hbase-client Fix For: 0.90.5 In abort() of HConnectionManager$HConnectionImplementation, instance of HConnectionImplementation is marked as this.closed=true. There is no way for client application to check the hbase client connection whether it is still opened/good (this.closed=false) or not. We need a method to validate the state of a connection like isClosed(). {code} public boolean isClosed(){ return this.closed; } {code} Once the connection is closed and it should get deleted. Client application still gets a connection from HConnectionManager.getConnection(Configuration) and tries to make a RPC call to RS, since connection is already closed, HConnectionImplementation.getRegionServerWithRetries throws RetriesExhaustedException with error message {code} Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '----xxx', but failed after 10 attempts. Exceptions: java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1008) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META
[ https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159676#comment-13159676 ] Ted Yu commented on HBASE-4616: --- Where is MetaSearchRow defined ? What's the difference between MetaSearchRow.getStartSearchRow(tableName, null) and MetaSearchRow.getStartSearchRow(tableName, HConstants.EMPTY_BYTE_ARRAY) ? Update hregion encoded name to reduce logic and prevent region collisions in META - Key: HBASE-4616 URL: https://issues.apache.org/jira/browse/HBASE-4616 Project: HBase Issue Type: Umbrella Reporter: Alex Newman Assignee: Alex Newman Attachments: HBASE-4616.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META
[ https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159694#comment-13159694 ] Ted Yu commented on HBASE-4616: --- In HRegionInfo.java: {code} public static final int END_OF_TABLE_TABLE_NAME = END_OF_TABLE_NAME + 1; {code} I think END_OF_TABLE_NAME_FOR_EMPTY_ENDKEY would be a better name. In createRegionName(): {code} if (id != null || id.length 0 ) { {code} should be used above. Update hregion encoded name to reduce logic and prevent region collisions in META - Key: HBASE-4616 URL: https://issues.apache.org/jira/browse/HBASE-4616 Project: HBase Issue Type: Umbrella Reporter: Alex Newman Assignee: Alex Newman Attachments: HBASE-4616.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META
[ https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159761#comment-13159761 ] Ted Yu commented on HBASE-4616: --- In SplitTransaction.java, getDaughterRegionIdTimestamp() is removed. I think we should take care of clock skew. Also, using hri.getRegionId() - 1 as Id for daughter region Ids is not intuitive. Update hregion encoded name to reduce logic and prevent region collisions in META - Key: HBASE-4616 URL: https://issues.apache.org/jira/browse/HBASE-4616 Project: HBase Issue Type: Umbrella Reporter: Alex Newman Assignee: Alex Newman Attachments: HBASE-4616.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4899) Region would be assigned twice easily with continually killing server and moving region in testing environment
[ https://issues.apache.org/jira/browse/HBASE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159858#comment-13159858 ] Ted Yu commented on HBASE-4899: --- {code} + + because it has been opened in + + addressFromAM.getServerName()); {code} We should use the value of rit (RegionState) in the above log instead of hard coding 'opened'. Region would be assigned twice easily with continually killing server and moving region in testing environment --- Key: HBASE-4899 URL: https://issues.apache.org/jira/browse/HBASE-4899 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: chunhui shen Attachments: hbase-4899.patch Before assigning region in ServerShutdownHandler#process, it will check whether region is in RIT, however, this checking doesn't work as the excepted in the following case: 1.move region A from server B to server C 2.kill server B 3.start server B immediately Let's see what happen in the code for the above case {code} for step1: 1.1 server B close the region A, 1.2 master setOffline for region A,(AssignmentManager#setOffline:this.regions.remove(regionInfo)) 1.3 server C start to open region A.(Not completed) for step3: master ServerShutdownHandler#process() for server B { .. splitlog() ... ListRegionState regionsInTransition = this.services.getAssignmentManager() .processServerShutdown(this.serverName); ... Skip regions that were in transition unless CLOSING or PENDING_CLOSE ... assign region } {code} In fact, when running ServerShutdownHandler#process()#this.services.getAssignmentManager().processServerShutdown(this.serverName), region A is in RIT (step1.3 not completed), but the return ListRegionState regionsInTransition doesn't contain it, because region A has removed from AssignmentManager.regions by AssignmentManager#setOffline in step 1.2 Therefore, region A will be assigned twice. Actually, one server killed and started twice will also easily cause region assigned twice. Exclude the above reason, another probability : when execute ServerShutdownHandler#process()#MetaReader.getServerUserRegions ,region is included which is in RIT now. But after completing MetaReader.getServerUserRegions, the region has been opened in other server and is not in RIT now. In our testing environment where balancing,moving and killing are executed periodly, assigning region twice often happens, and it is hateful because it will affect other test cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4899) Region would be assigned twice easily with continually killing server and moving region in testing environment
[ https://issues.apache.org/jira/browse/HBASE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159859#comment-13159859 ] Ted Yu commented on HBASE-4899: --- @Chunhui: Please let us know testing result on your QA environment. Region would be assigned twice easily with continually killing server and moving region in testing environment --- Key: HBASE-4899 URL: https://issues.apache.org/jira/browse/HBASE-4899 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4899.patch Before assigning region in ServerShutdownHandler#process, it will check whether region is in RIT, however, this checking doesn't work as the excepted in the following case: 1.move region A from server B to server C 2.kill server B 3.start server B immediately Let's see what happen in the code for the above case {code} for step1: 1.1 server B close the region A, 1.2 master setOffline for region A,(AssignmentManager#setOffline:this.regions.remove(regionInfo)) 1.3 server C start to open region A.(Not completed) for step3: master ServerShutdownHandler#process() for server B { .. splitlog() ... ListRegionState regionsInTransition = this.services.getAssignmentManager() .processServerShutdown(this.serverName); ... Skip regions that were in transition unless CLOSING or PENDING_CLOSE ... assign region } {code} In fact, when running ServerShutdownHandler#process()#this.services.getAssignmentManager().processServerShutdown(this.serverName), region A is in RIT (step1.3 not completed), but the return ListRegionState regionsInTransition doesn't contain it, because region A has removed from AssignmentManager.regions by AssignmentManager#setOffline in step 1.2 Therefore, region A will be assigned twice. Actually, one server killed and started twice will also easily cause region assigned twice. Exclude the above reason, another probability : when execute ServerShutdownHandler#process()#MetaReader.getServerUserRegions ,region is included which is in RIT now. But after completing MetaReader.getServerUserRegions, the region has been opened in other server and is not in RIT now. In our testing environment where balancing,moving and killing are executed periodly, assigning region twice often happens, and it is hateful because it will affect other test cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4773) HBaseAdmin may leak ZooKeeper connections
[ https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158569#comment-13158569 ] Ted Yu commented on HBASE-4773: --- Integrated to 0.90, 0.92 and TRUNK. Thanks for the patch Xufeng. Thanks for the review Jinchao and Ramkrishna. HBaseAdmin may leak ZooKeeper connections - Key: HBASE-4773 URL: https://issues.apache.org/jira/browse/HBASE-4773 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: gaojinchao Priority: Critical Fix For: 0.90.5 Attachments: 4773.patch, branches_4773.patch, trunk_4773_patch.patch When master crashs, HBaseAdmin will leaks ZooKeeper connections I think we should close the zk connetion when throw MasterNotRunningException public HBaseAdmin(Configuration c) throws MasterNotRunningException, ZooKeeperConnectionException { this.conf = HBaseConfiguration.create(c); this.connection = HConnectionManager.getConnection(this.conf); this.pause = this.conf.getLong(hbase.client.pause, 1000); this.numRetries = this.conf.getInt(hbase.client.retries.number, 10); this.retryLongerMultiplier = this.conf.getInt(hbase.client.retries.longer.multiplier, 10); //we should add this code and close the zk connection try{ this.connection.getMaster(); }catch(MasterNotRunningException e){ HConnectionManager.deleteConnection(conf, false); throw e; } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158589#comment-13158589 ] Ted Yu commented on HBASE-4120: --- The new tests should be placed under src/test/java/org/apache/hadoop/hbase/allocation/, instead of src/test/java/org/apache/hadoop/hbase/allocation/test isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158598#comment-13158598 ] Ted Yu commented on HBASE-4120: --- Please also categorize the new tests. See N's email on d...@hbase.apache.org, entitled 'unit tests - pom.xml surefire changed - categories are available', for guideline. isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158626#comment-13158626 ] Ted Yu commented on HBASE-4120: --- Indentation in certain classes was off. e.g. {code} public static int initRegionPriority(HRegion region) { {code} Please use Eclipse formatter Nicloas attached to HBASE-3678 to format every file. {code} private static int initRegionPriority(String region, boolean force) { ... if (ret != null) return ret; {code} Please enclose the return statement in curly braces. isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158648#comment-13158648 ] Ted Yu commented on HBASE-4820: --- @Jimmy: Please upload latest patch to review board. https://builds.apache.org/job/PreCommit-HBASE-Build/367//testReport/ was incomplete. Please run log splitting related tests manually. Let's see what Kannan has to say. Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch, 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch, 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme_new.patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira