[jira] [Commented] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm
[ https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245018#comment-13245018 ] Cosmin Lehene commented on HBASE-5656: -- Lars, so if we change the hcd default compression from NONE to LZO, but instead we write the HFile explicitly without compression this will create a table that actually has compression, which is not what we want. I guess if we want do be defensive we could have a reader.getCompression() != hcd.getCompression() condition. LoadIncrementalHFiles createTable should detect and set compression algorithm - Key: HBASE-5656 URL: https://issues.apache.org/jira/browse/HBASE-5656 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.92.1 Reporter: Cosmin Lehene Assignee: Cosmin Lehene Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5656-simple.txt, HBASE-5656-0.92.patch, HBASE-5656-0.92.patch Original Estimate: 1h Remaining Estimate: 1h LoadIncrementalHFiles doesn't set compression when creating the the table. This can be detected from the files within each family dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5625) Avoid byte buffer allocations when reading a value from a Result object
[ https://issues.apache.org/jira/browse/HBASE-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241150#comment-13241150 ] Cosmin Lehene commented on HBASE-5625: -- Microbenchmarking is a bit trickier than this. Perhaps we should mention some guideline it in the HBase developer documentation. There's a lot of stuff going on in the HotSpot (e.g. dynamic compilation) that needs to be taken into account. Here are some resources: http://www.slideshare.net/drorbr/so-you-want-to-write-your-own-benchmark-presentation https://wikis.oracle.com/display/HotSpotInternals/MicroBenchmarks We might be able to use something like junit-benchmarks. http://labs.carrotsearch.com/junit-benchmarks.html Cosmin Avoid byte buffer allocations when reading a value from a Result object --- Key: HBASE-5625 URL: https://issues.apache.org/jira/browse/HBASE-5625 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.92.1 Reporter: Tudor Scurtu Assignee: Tudor Scurtu Labels: patch Attachments: 5625.txt, 5625v2.txt, 5625v3.txt, 5625v4.txt When calling Result.getValue(), an extra dummy KeyValue and its associated underlying byte array are allocated, as well as a persistent buffer that will contain the returned value. These can be avoided by reusing a static array for the dummy object and by passing a ByteBuffer object as a value destination buffer to the read method. The current functionality is maintained, and we have added a separate method call stack that employs the described changes. I will provide more details with the patch. Running tests with a profiler, the reduction of read time seems to be of up to 40%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5665) Repeated split causes HRegionServer failures and breaks table
[ https://issues.apache.org/jira/browse/HBASE-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241586#comment-13241586 ] Cosmin Lehene commented on HBASE-5665: -- Indeed it seems to be a problem with forced splits. I'm not sure though if the natural splits are safe - they seem to be, but I need to test that too. RegionSplitPolicy.getSplitPoint() calls Store.getSplitPoint() Store.getSplitPoint seems to do the check. {code} for (StoreFile sf : storefiles) { if (sf.isReference()) { // Should already be enforced since we return false in this case assert false : getSplitPoint() called on a region that can't split!; return null; } {code} BTW, we also have Store.hasReferences() {code} private boolean hasReferences(CollectionStoreFile files) { if (files != null files.size() 0) { for (StoreFile hsf: files) { if (hsf.isReference()) { return true; } } } return false; } {code} However here's the code in HRegion.checkSplit() If there's an explicit split point it won't get to do the reference check. {code} public byte[] checkSplit() { // Can't split META if (getRegionInfo().isMetaRegion()) { if (shouldForceSplit()) { LOG.warn(Cannot split meta regions in HBase 0.20 and above); } return null; } if (this.explicitSplitPoint != null) { return this.explicitSplitPoint; } if (!splitPolicy.shouldSplit()) { return null; } byte[] ret = splitPolicy.getSplitPoint(); if (ret != null) { try { checkRow(ret, calculated split); } catch (IOException e) { LOG.error(Ignoring invalid split, e); return null; } } return ret; } {code} Multiple return points + a ret variable - this could use some polishing too :) I'm a bit puzzled about the natural split, because, I've seen the problem with a forced split from UI where I don't think we provide an explicit split point. Cosmin Repeated split causes HRegionServer failures and breaks table -- Key: HBASE-5665 URL: https://issues.apache.org/jira/browse/HBASE-5665 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.92.1, 0.94.0, 0.96.0, 0.94.1 Reporter: Cosmin Lehene Assignee: Cosmin Lehene Priority: Blocker Attachments: HBASE-5665-0.92.patch Repeated splits on large tables (2 consecutive would suffice) will essentially break the table (and the cluster), unrecoverable. The regionserver doing the split dies and the master will get into an infinite loop trying to assign regions that seem to have the files missing from HDFS. The table can be disabled once. upon trying to re-enable it, it will remain in an intermediary state forever. I was able to reproduce this on a smaller table consistently. {code} hbase(main):030:0 (0..1).each{|x| put 't1', #{x}, 'f1:t', 'dd'} hbase(main):030:0 (0..1000).each{|x| split 't1', #{x*10}} {code} Running overlapping splits in parallel (e.g. #{x*10+1}, #{x*10+2}... ) will reproduce the issue almost instantly and consistently. {code} 2012-03-28 10:57:16,320 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Offlined parent region t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1. in META 2012-03-28 10:57:16,321 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for t1,5,1332957435767.648d30de55a5cec6fc2f56dcb3c7eee1.. compaction_queue=(0:1), split_queue=10 2012-03-28 10:57:16,343 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1.; Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124 java.io.IOException: Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124 at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:363) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:451) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: File does not exist: /hbase/t1/589c44cabba419c6ad8c9b427e5894e3.2fb0473f4e71339e88dab0ee0d4dffa1/f1/d62a852c25ad44e09518e102ca557237 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822) at
[jira] [Commented] (HBASE-5665) Repeated split causes HRegionServer failures and breaks table
[ https://issues.apache.org/jira/browse/HBASE-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241593#comment-13241593 ] Cosmin Lehene commented on HBASE-5665: -- BTW - I don't think getSplitPoint should do that check, and we also shouldn't have to places where we check for references - perhaps we should have another JIRA to fix this in trunk? Repeated split causes HRegionServer failures and breaks table -- Key: HBASE-5665 URL: https://issues.apache.org/jira/browse/HBASE-5665 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.92.1 Reporter: Cosmin Lehene Assignee: Cosmin Lehene Priority: Blocker Attachments: HBASE-5665-0.92.patch Repeated splits on large tables (2 consecutive would suffice) will essentially break the table (and the cluster), unrecoverable. The regionserver doing the split dies and the master will get into an infinite loop trying to assign regions that seem to have the files missing from HDFS. The table can be disabled once. upon trying to re-enable it, it will remain in an intermediary state forever. I was able to reproduce this on a smaller table consistently. {code} hbase(main):030:0 (0..1).each{|x| put 't1', #{x}, 'f1:t', 'dd'} hbase(main):030:0 (0..1000).each{|x| split 't1', #{x*10}} {code} Running overlapping splits in parallel (e.g. #{x*10+1}, #{x*10+2}... ) will reproduce the issue almost instantly and consistently. {code} 2012-03-28 10:57:16,320 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Offlined parent region t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1. in META 2012-03-28 10:57:16,321 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for t1,5,1332957435767.648d30de55a5cec6fc2f56dcb3c7eee1.. compaction_queue=(0:1), split_queue=10 2012-03-28 10:57:16,343 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1.; Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124 java.io.IOException: Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124 at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:363) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:451) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: File does not exist: /hbase/t1/589c44cabba419c6ad8c9b427e5894e3.2fb0473f4e71339e88dab0ee0d4dffa1/f1/d62a852c25ad44e09518e102ca557237 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1813) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:341) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1008) at org.apache.hadoop.hbase.io.HalfStoreFileReader.init(HalfStoreFileReader.java:65) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:467) at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548) at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:284) at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2511) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:450) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3229) at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:504) at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:484) ... 1 more 2012-03-28 10:57:16,345 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server ld2,60020,1332957343833: Abort; we got an error after point-of-no-return {code} http://hastebin.com/diqinibajo.avrasm later edit: (I'm using the last 4
[jira] [Commented] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm
[ https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239825#comment-13239825 ] Cosmin Lehene commented on HBASE-5656: -- Lars, it will create a new table without compression. We're adding LZO compressed HFiles and expecting the destination table to inherit that. LoadIncrementalHFiles createTable should detect and set compression algorithm - Key: HBASE-5656 URL: https://issues.apache.org/jira/browse/HBASE-5656 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.92.1 Reporter: Cosmin Lehene Assignee: Cosmin Lehene Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5656-0.92.patch Original Estimate: 1h Remaining Estimate: 1h LoadIncrementalHFiles doesn't set compression when creating the the table. This can be detected from the files within each family dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm
[ https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239947#comment-13239947 ] Cosmin Lehene commented on HBASE-5656: -- Lars, It will work, it just won't have compression. Regarding inclusion in (any) release: this is a utility, I don't see any risk including it. Regarding the reader - yes I should close that or, perhaps move the logic inside the loop's try/catch block to avoid the boilerplate. Let me know if you have any preference. LoadIncrementalHFiles createTable should detect and set compression algorithm - Key: HBASE-5656 URL: https://issues.apache.org/jira/browse/HBASE-5656 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.92.1 Reporter: Cosmin Lehene Assignee: Cosmin Lehene Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5656-0.92.patch Original Estimate: 1h Remaining Estimate: 1h LoadIncrementalHFiles doesn't set compression when creating the the table. This can be detected from the files within each family dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm
[ https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240001#comment-13240001 ] Cosmin Lehene commented on HBASE-5656: -- Lars, currently we use 0.92 and have other patches in our build. I don't need it for 0.94.0, we'd just have to port the patch once we switch to 0.94. Btw. a minimal version of the patch would just do hcd.setCompressionType(reader.getCompressionAlgorithm()) inside the loop. LoadIncrementalHFiles createTable should detect and set compression algorithm - Key: HBASE-5656 URL: https://issues.apache.org/jira/browse/HBASE-5656 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.92.1 Reporter: Cosmin Lehene Assignee: Cosmin Lehene Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5656-0.92.patch, HBASE-5656-0.92.patch Original Estimate: 1h Remaining Estimate: 1h LoadIncrementalHFiles doesn't set compression when creating the the table. This can be detected from the files within each family dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4914) Enhance MapReduce TableInputFormat to Support N-mappers per Region
[ https://issues.apache.org/jira/browse/HBASE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231008#comment-13231008 ] Cosmin Lehene commented on HBASE-4914: -- Hadoop 0.20 doesn't behave well with large number of map tasks, so we implemented a N-Regions per map (through a splits_per_map property). I guess ideally we should be able to specify a min/max number of map tasks as well and have these two happen implicitly, perhaps with some sane thresholds. Enhance MapReduce TableInputFormat to Support N-mappers per Region -- Key: HBASE-4914 URL: https://issues.apache.org/jira/browse/HBASE-4914 Project: HBase Issue Type: Sub-task Components: client, regionserver Reporter: Nicolas Spiegelberg Priority: Blocker Fix For: 0.94.0 Current TableInputFormat based MR jobs create exactly one mapper per region where each mapper sets one Scan with appropriate start/stop row keys. This change allows jobs to be run with any number of mappers per region, so that when a mapper fails, there will be less data to be reprocessed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4890) fix possible NPE in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224921#comment-13224921 ] Cosmin Lehene commented on HBASE-4890: -- It seems fine. I get the IOE instead of NPE now {code} java.util.concurrent.ExecutionException: java.io.IOException: Call to ld1/10.72.32.50:60020 failed on local exception: org.apache.hadoop.hbase.ipc.HBaseClient$CallTimeoutException: Call id=1321, waitTime=97566, rpcTimetout=6 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752) at com.adobe.saasbase.scratch.Smith$PutThread.run(Smith.java:74) Caused by: java.io.IOException: Call to ld1/10.72.32.50:60020 failed on local exception: org.apache.hadoop.hbase.ipc.HBaseClient$CallTimeoutException: Call id=1321, waitTime=97566, rpcTimetout=6 at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:953) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:922) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy5.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.ipc.HBaseClient$CallTimeoutException: Call id=1321, waitTime=97566, rpcTimetout=6 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.cleanupCalls(HBaseClient.java:684) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:613) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505) {code} fix possible NPE in HConnectionManager -- Key: HBASE-4890 URL: https://issues.apache.org/jira/browse/HBASE-4890 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.92.1 Attachments: 4890-v2.txt, 4890-v3.txt, 4890-v3.txt, 4890-v3.txt, 4890.txt, splits.txt I was running YCSB against a 0.92 branch and encountered this error message: {code} 11/11/29 08:47:16 WARN client.HConnectionManager$HConnectionImplementation: Failed all from region=usertable,user3917479014967760871,1322555655231.f78d161e5724495a9723bcd972f97f41., hostname=c0316.hal.cloudera.com, port=57020 java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1501) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1353) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:898) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:775) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:750) at com.yahoo.ycsb.db.HBaseClient.update(Unknown Source) at com.yahoo.ycsb.DBWrapper.update(Unknown Source) at com.yahoo.ycsb.workloads.CoreWorkload.doTransactionUpdate(Unknown Source) at