[jira] [Commented] (HBASE-5423) Regionserver may block forever on waitOnAllRegionsToClose when aborting
[ https://issues.apache.org/jira/browse/HBASE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211060#comment-13211060 ] stack commented on HBASE-5423: -- @Chunhui Sounds good. Want me to change the name on commit or do you want to put up a new patch? I'd add a log that we were exiting though online regions too Good stuff. Regionserver may block forever on waitOnAllRegionsToClose when aborting --- Key: HBASE-5423 URL: https://issues.apache.org/jira/browse/HBASE-5423 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-5423.patch If closeRegion throws any exception (It would be caused by FS ) when RS is aborting, RS will block forever on waitOnAllRegionsToClose(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5431) Improve delete marker handling in Import M/R jobs
[ https://issues.apache.org/jira/browse/HBASE-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211061#comment-13211061 ] stack commented on HBASE-5431: -- So, we output Deletes and then we output Deletes? We'll be changing the order of kvs that came in in the Result? Thats ok? Improve delete marker handling in Import M/R jobs - Key: HBASE-5431 URL: https://issues.apache.org/jira/browse/HBASE-5431 Project: HBase Issue Type: Sub-task Components: mapreduce Affects Versions: 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Attachments: 5431.txt Import currently create a new Delete object for each delete KV found in a result object. This can be improved with the new Delete API that allows adding a delete KV to a Delete object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup
[ https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211067#comment-13211067 ] stack commented on HBASE-5209: -- My fault. I was reading HBASE-5209-v1.diff. Pardon me. Thanks for testing. Old client against new cluster is what needs to work (new client against old server is YMMD). Patch looks good to me. Mind attaching it here so we can run it through hadoopqa to make sure it doesn't have side effects? Then I'll commit. Thanks David. HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup Key: HBASE-5209 URL: https://issues.apache.org/jira/browse/HBASE-5209 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.94.0, 0.90.5, 0.92.0 Reporter: Aditya Acharya Assignee: David S. Wang Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff I have a multi-master HBase set up, and I'm trying to programmatically determine which of the masters is currently active. But the API does not allow me to do this. There is a getMaster() method in the HConnection class, but it returns an HMasterInterface, whose methods do not allow me to find out which master won the last race. The API should have a getActiveMasterHostname() or something to that effect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5255) Use singletons for OperationStatus to save memory
[ https://issues.apache.org/jira/browse/HBASE-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211074#comment-13211074 ] stack commented on HBASE-5255: -- Thanks Benoit. Fixed it w/ HBASE-5432. Use singletons for OperationStatus to save memory - Key: HBASE-5255 URL: https://issues.apache.org/jira/browse/HBASE-5255 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.90.5, 0.92.0 Reporter: Benoit Sigoure Assignee: Benoit Sigoure Priority: Minor Labels: performance Fix For: 0.94.0, 0.92.1 Attachments: 5255-92.txt, 5255-v2.txt, HBASE-5255-0.92-Use-singletons-to-remove-unnecessary-memory-allocati.patch, HBASE-5255-trunk-Use-singletons-to-remove-unnecessary-memory-allocati.patch Every single {{Put}} causes the allocation of at least one {{OperationStatus}}, yet {{OperationStatus}} is almost always stateless, so these allocations are unnecessary and could be avoided. Attached patch adds a few singletons and uses them, with no public API change. I didn't test the patches, but you get the idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1323#comment-1323 ] stack commented on HBASE-5422: -- @Chunhui Want to make a new patch that does this -- 'I agree with make an addPlan method that takes a Map of plans.'? StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, hbase-5422.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Regionserver's log 2012-02-13 18:07:43,537 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,560 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Through the RS's log, we could find it is larger than 3mins from receive openRegion request to start processing openRegion, causing
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1328#comment-1328 ] stack commented on HBASE-5317: -- Gregory it failed with this: {code} [INFO] --- maven-compiler-plugin:2.0.2:testCompile (default-testCompile) @ hbase --- [INFO] Compiling 331 source files to /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-classes [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 23.287s [INFO] Finished at: Sat Feb 18 01:46:56 UTC 2012 [INFO] Final Memory: 41M/424M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile (default-testCompile) on project hbase: Compilation failure [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigrationRemovingHTD.java:[79,33] cannot find symbol [ERROR] symbol : method getDefaultRootDirPath() [ERROR] location: class org.apache.hadoop.hbase.HBaseTestingUtility [ERROR] - [Help 1] {code} What you reckon thats about? Is it the 0.23 profile? leaking? Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, HBASE-5317-v3.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5431) Improve delete marker handling in Import M/R jobs
[ https://issues.apache.org/jira/browse/HBASE-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211202#comment-13211202 ] stack commented on HBASE-5431: -- bq. In fact the only correct ordering would be to create a Put or Delete for each KV. Yeah. I was wondering about this. OK. +1. For Amit's patch, if we switched on his facility, then we'd export with memstorets? Though I suppose that'd be no good at import time? Improve delete marker handling in Import M/R jobs - Key: HBASE-5431 URL: https://issues.apache.org/jira/browse/HBASE-5431 Project: HBase Issue Type: Sub-task Components: mapreduce Affects Versions: 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Attachments: 5431.txt Import currently create a new Delete object for each delete KV found in a result object. This can be improved with the new Delete API that allows adding a delete KV to a Delete object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3149) Make flush decisions per column family
[ https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211204#comment-13211204 ] stack commented on HBASE-3149: -- Thanks @Nicolas (and thanks @Mubarak -- sounds like something to indeed get into 0.92). At the same time, I'd think this issue still worth some time; if lots of cfs and only one is filling, its silly to flush the others as we do now because one is over the threshold. Make flush decisions per column family -- Key: HBASE-3149 URL: https://issues.apache.org/jira/browse/HBASE-3149 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan Assignee: Nicolas Spiegelberg Today, the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212107#comment-13212107 ] stack commented on HBASE-5396: -- @Jieshan Thats interesting. Thanks for checking it out. Why do we not have the prob. in 0.92/trunk? Is code different? Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, HBASE-5396-90-forReview.patch, HBASE-5396-90.patch The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212109#comment-13212109 ] stack commented on HBASE-5416: -- @Max What you think about the failed TestFilter in the above? Is it your patch? Thanks. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()
[ https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212127#comment-13212127 ] stack commented on HBASE-5424: -- @Zhiyuan Your patch failed to apply to trunk. See the console output: {code} patching file src/main/java/org/apache/hadoop/hbase/client/HTable.java Hunk #1 FAILED at 423. patch unexpectedly ends in middle of line Hunk #2 succeeded at 428 with fuzz 2 (offset -14 lines). 1 out of 2 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/client/HTable.java.rej PATCH APPLICATION FAILED {code} Mind fixing? Your patch seems to have some odd formatting too. Thanks. HTable meet NPE when call getRegionInfo() - Key: HBASE-5424 URL: https://issues.apache.org/jira/browse/HBASE-5424 Project: HBase Issue Type: Bug Affects Versions: 0.90.1, 0.90.5 Reporter: junhua yang Attachments: HBASE-5424.patch, HBase-5424_1.patch Original Estimate: 48h Remaining Estimate: 48h We meet NPE when call getRegionInfo() in testing environment. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73) at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418) This NPE also make the table.jsp can't show the region information of this table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size
[ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212142#comment-13212142 ] stack commented on HBASE-4365: -- bq. Wouldn't we potentially do a lot of splitting when there are many regionservers? Each regionserver would split with the same growing reluctance. Don't we want a bunch of splitting when lots of regionservers so they all get some amount of the incoming load promptly? This issue is about getting us to split fast at the start of a bulk load but then having the splitting fall off as more data made it in. I'm thinking our default regionsize should be 10G. I should add this to the this patch. I don't get what you are saying on the end Lars. Is it good or bad that there are 5 regions on a regionserver before we get to the max size? Balancer will cut in and move regions to other servers and they'll then split eagerly at first with rising reluctance. Add a decent heuristic for region size -- Key: HBASE-4365 URL: https://issues.apache.org/jira/browse/HBASE-4365 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0, 0.92.1 Reporter: Todd Lipcon Priority: Critical Labels: usability Attachments: 4365.txt A few of us were brainstorming this morning about what the default region size should be. There were a few general points made: - in some ways it's better to be too-large than too-small, since you can always split a table further, but you can't merge regions currently - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large regions (10GB+) - for small tables you may want a small region size just so you can distribute load better across a cluster - for big tables, multi-GB is probably best -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3149) Make flush decisions per column family
[ https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212166#comment-13212166 ] stack commented on HBASE-3149: -- @Nicolas I wonder about this... hbase.hstore.compaction.min.size. When we compact, don't we have to take adjacent files as part of our ACID guarantees? This would frustrate that? (I'll take a look... tomorrow). I'm wondering because i want to figure how to make it so we favor reference files... so they are always included in a compaction. Make flush decisions per column family -- Key: HBASE-3149 URL: https://issues.apache.org/jira/browse/HBASE-3149 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan Assignee: Nicolas Spiegelberg Fix For: 0.92.1 Today, the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212370#comment-13212370 ] stack commented on HBASE-5396: -- If you found it in 0.92, thats good enough -- its in TRUNK I'd say. Do you have more of the regionserver log? Why does it say its aborting? You don't have it in your log above (The logs above look 'normal'.. we need to bits that show it gone awry... thanks Jieshan). Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, HBASE-5396-trunk.patch The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212375#comment-13212375 ] stack commented on HBASE-5416: -- bq. I have a question about this. Manual == hbase book? And what 'filters package doc' is? Is it comments in source processed by javadoc, or somethinc else? Sorry for these questions - have no java experience . No problem. Yes, the 'reference guide' or manual is this http://hbase.apache.org/book.html Its a bit tough making a patch for it if you don't know doc book too well so could just put a paragraph here and i'll get the doc in for you. Or, the filters package doc I was referring to is here: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description... but the doc here is pretty pathetic and describing this facility there might not go so well (its of a subtlety the current doc does not allow). Just stick a bit of a paragraph here and I'll figure where to put it. Go easy Max. You saw the failed test above? The fail in TestFilter? Do you see that when you run your tests local? On trunk you do it so: {code} % mvn test -P localTests -Dtest=TestFilter {code} Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213335#comment-13213335 ] stack commented on HBASE-5396: -- The log does not show the regionserver aborting. Should it? Or am I misunderstanding (I guess I'm not clear on what I should be looking for in this log. Please help me Jieshan. Sorry for being a bit slow). Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, HBASE-5396-trunk.patch, Logs-TestFor92.rar The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5439) Fix some performance findbugs issues
[ https://issues.apache.org/jira/browse/HBASE-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213339#comment-13213339 ] stack commented on HBASE-5439: -- +1 Fix some performance findbugs issues Key: HBASE-5439 URL: https://issues.apache.org/jira/browse/HBASE-5439 Project: HBase Issue Type: Improvement Components: performance Affects Versions: 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Attachments: HBASE-5439.patch Given 0.94 is the performance release, I took a look at some performance findbugs. This patch should fixeall of the following types of findbugs (except one case in generated code): Bug type DM_NUMBER_CTOR Bug type DM_STRING_CTOR Bug type DM_BOOLEAN_CTOR (these are simple constructor issues where Type.valueOf is more efficient Fixes one of: Bug type SIC_INNER_SHOULD_BE_STATIC (Inner class should be static) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213374#comment-13213374 ] stack commented on HBASE-5396: -- So what am I looking to see in the log snippet above (and in the attached log?) Thanks Jieshan. Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, HBASE-5396-trunk.patch, Logs-TestFor92.rar The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size
[ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213389#comment-13213389 ] stack commented on HBASE-4365: -- If I understand correctly a regionserver would still split at a size 10gb until there about 900 regions for the table (assuming somewhat even distribution). Well each split would take longer because the threshold will have grown closer to the 10GB, but yeah. And I think this is what we want. Doing to the power of 3 would make us rise to the 10GB faster. We'd split on first flush then at This is probably ok. More regions means that we'll fan out regions over the cluster a little faster. We'll have 9 regions for a table on each server which is probably too many still. We could do to the power of 3 so we'd split on first flush, then at 1G, 3.4G, 8.2G and then we'd be at our 10G limit. Add a decent heuristic for region size -- Key: HBASE-4365 URL: https://issues.apache.org/jira/browse/HBASE-4365 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0, 0.92.1 Reporter: Todd Lipcon Priority: Critical Labels: usability Attachments: 4365.txt A few of us were brainstorming this morning about what the default region size should be. There were a few general points made: - in some ways it's better to be too-large than too-small, since you can always split a table further, but you can't merge regions currently - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large regions (10GB+) - for small tables you may want a small region size just so you can distribute load better across a cluster - for big tables, multi-GB is probably best -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5433) [REST] Add metrics to keep track of success/failure count
[ https://issues.apache.org/jira/browse/HBASE-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213391#comment-13213391 ] stack commented on HBASE-5433: -- +1 on patch. Mubarak, can you see the metrics coming out in your metrics system? They show ok? Will commit tomorrow unless objection (Andy? Want to check it out?) [REST] Add metrics to keep track of success/failure count - Key: HBASE-5433 URL: https://issues.apache.org/jira/browse/HBASE-5433 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Labels: noob Fix For: 0.94.0 Attachments: HBASE-5433.trunk.v1.patch In a production environment, the visibility of successful REST request(s) are not getting exposed to metric system as we have only one metric (requests) today. Proposing to add more metrics such as successful_get_count, failed_get_count, successful_put_count, failed_put_count The current implementation increases the request count at the beginning of the method implementation and it is very hard to monitor requests (unless turn on debug, find the row_key and validate it in get/scan using hbase shell), it will be very useful to ops to keep an eye as requests from cross data-centers are trying to write data to one cluster using REST gateway through load balancer (and there is no visibility of which REST-server/RS failed to write data) {code} Response update(final CellSetModel model, final boolean replace) { // for requests servlet.getMetrics().incrementRequests(1); .. .. table.put(puts); table.flushCommits(); ResponseBuilder response = Response.ok(); // for successful_get_count servlet.getMetrics().incrementSuccessfulGetRequests(1); return response.build(); } catch (IOException e) { // for failed_get_count servlet.getMetrics().incrementFailedGetRequests(1); throw new WebApplicationException(e, Response.Status.SERVICE_UNAVAILABLE); } finally { } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213399#comment-13213399 ] stack commented on HBASE-4991: -- @Mubarak Do we need to add this method to the region server interface? {code} + public int getRegionsCount(byte[] regionName) throws IOException; {code} Can we not just count what comes back from the get on online regions? Do we have to run the region delete in the Master process? Can the client not do it? Is it really necessary adding + public MasterDeleteRegionTracker getDeleteRegionTracker(); to the MasterServices? This will have a ripple effect through Tests and it seems like a bit of an exotic API to have in this basic Interface. I like the refactor in HRegion. Does all of this new code need to be in HRegionServer? Can it live in a class of its own? There must be a million holes here (HRS crashes in middle of file moving or creation of the merged region, files partially moved or deleted). Does this code all need to be in core? Can we not make a few primitives and then run it all from outside in a tool or script w/ state recorded as we go so can resume if fail mid-way? There are a bunch of moving pieces here. Its all bundled up in core code so its going to be tough to test. Adding this to onlineregions, + public void deleteRegion(String regionName) throws IOException, KeeperException;, do all removals from online regions now use this new API (Its probably good having it here... but just wondering about the places where regions currently get removed from online map, do they go a different route than this new one?) H... looks like a bunch of state is being tracked in zk. Thats good. Its all custom to this feature. How hard will it be to reuse parts to do say an online merge of a bunch of adjacent regions? Yeah, there is a lot of moving parts... a master delete tracker and a regionserver delete tracker... I've not done an extensive review of design but that seems pretty heavy going. Are the enums duplicated? Why does zookeeper package have classes particular to master and regionserver? Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request
[ https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214114#comment-13214114 ] stack commented on HBASE-5434: -- @Mubarak What is your wiki id and I'll add you as a contributor [REST] Include more metrics in cluster status request - Key: HBASE-5434 URL: https://issues.apache.org/jira/browse/HBASE-5434 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5434.trunk.v1.patch /status/cluster shows only {code} stores=2 storefiless=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 {code} for a region but master web-ui shows {code} stores=1, storefiles=0, storefileUncompressedSizeMB=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 readRequestsCount=0 writeRequestsCount=0 rootIndexSizeKB=0 totalStaticIndexSizeKB=0 totalStaticBloomSizeKB=0 totalCompactingKVs=0 currentCompactedKVs=0 compactionProgressPct=NaN {code} In a write-heavy REST gateway based production environment, ops team needs to verify whether write counters are getting incremented per region (they do run /status/cluster on each REST server), we can get the same values from *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but some home-grown tools needs to parse the output of /status/cluster and updates the dashboard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5456) Introduce PowerMock into our unit tests to reduce unnecessary method exposure
[ https://issues.apache.org/jira/browse/HBASE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214119#comment-13214119 ] stack commented on HBASE-5456: -- I don't think that this is the kind of thing you can do by fiat. My take is that we'll use PowerMock when it makes sense (apart from the fact that PowerMock isn't exactly a walk-in-the-park). My current take on testing in hbase is that so much of our code base is test inscrutable and that anything we can do to shine light on these untested savannas of code is good by me, even unto adding public methods that allow injection of alternate classes. Introduce PowerMock into our unit tests to reduce unnecessary method exposure - Key: HBASE-5456 URL: https://issues.apache.org/jira/browse/HBASE-5456 Project: HBase Issue Type: Task Reporter: Zhihong Yu We should introduce PowerMock into our unit tests so that we don't have to expose methods intended to be used by unit tests. Here was Benoit's reply to a user of asynchbase about testability: OpenTSDB has unit tests that are mocking out HBaseClient just fine [1]. You can mock out pretty much anything on the JVM: final, private, JDK stuff, etc. All you need is the right tools. I've been very happy with PowerMock. It supports Mockito and EasyMock. I've never been keen on mutilating public interfaces for the sake of testing. With tools like PowerMock, we can keep the public APIs tidy while mocking and overriding anything, even in the most private guts of the classes. [1] https://github.com/stumbleupon/opentsdb/blob/master/src/uid/TestUniqueId.java#L66 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5457) add inline index in data block for data which are not clustered together
[ https://issues.apache.org/jira/browse/HBASE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214126#comment-13214126 ] stack commented on HBASE-5457: -- bq. So if we can add inline block index on required columns, the second column family then is not needed. What would this look like He? add inline index in data block for data which are not clustered together Key: HBASE-5457 URL: https://issues.apache.org/jira/browse/HBASE-5457 Project: HBase Issue Type: New Feature Reporter: He Yongqiang As we are go through our data schema, and we found we have one large column family which is just duplicating data from another column family and is just a re-org of the data to cluster data in a different way than the original column family in order to serve another type of queries efficiently. If we compare this second column family with similar situation in mysql, it is like an index in mysql. So if we can add inline block index on required columns, the second column family then is not needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3484) Replace memstore's ConcurrentSkipListMap with our own implementation
[ https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214132#comment-13214132 ] stack commented on HBASE-3484: -- bq. It probably has negative memory effects in its current incarnation. How you think Todd? Because of the tiering cost more or is it something to do w/ mslab allocations? What would you like to see test-wise proving this direction better than what we currently have? I could work up some tests? Replace memstore's ConcurrentSkipListMap with our own implementation Key: HBASE-3484 URL: https://issues.apache.org/jira/browse/HBASE-3484 Project: HBase Issue Type: Improvement Components: performance Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Attachments: hierarchical-map.txt By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements to it for our use case in MemStore: - add an iterator.replace() method which should allow us to do upsert much more cheaply - implement a Set directly without having to do MapKeyValue,KeyValue to save one reference per entry It turns out CSLM is in public domain from its development as part of JSR 166, so we should be OK with licenses. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5419) FileAlreadyExistsException has moved from mapred to fs package
[ https://issues.apache.org/jira/browse/HBASE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214138#comment-13214138 ] stack commented on HBASE-5419: -- Ok if this is for 0.94 only? I think its fine to drop 'support' for hadoop 0.20/branch-0.20-append in 0.94. FileAlreadyExistsException has moved from mapred to fs package -- Key: HBASE-5419 URL: https://issues.apache.org/jira/browse/HBASE-5419 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: D1767.1.patch, D1767.1.patch The FileAlreadyExistsException has moved from org.apache.hadoop.mapred.FileAlreadyExistsException to org.apache.hadoop.fs.FileAlreadyExistsException. HBase is currently using a class that is deprecated in hadoop trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop
[ https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214141#comment-13214141 ] stack commented on HBASE-4403: -- Retry Jimmy making sure the patch you want to run against hadoopqa is applied last? Adopt interface stability/audience classifications from Hadoop -- Key: HBASE-4403 URL: https://issues.apache.org/jira/browse/HBASE-4403 Project: HBase Issue Type: Task Affects Versions: 0.90.5, 0.92.0 Reporter: Todd Lipcon Assignee: Jimmy Xiang Fix For: 0.94.0 Attachments: hbase-4403-interface.txt, hbase-4403-interface_v2.txt, hbase-4403-interface_v3.txt, hbase-4403-nowhere-near-done.txt, hbase-4403.patch As HBase gets more widely used, we need to be more explicit about which APIs are stable and not expected to break between versions, which APIs are still evolving, etc. We also have many public classes that are really internal to the RS or Master and not meant to be used by users. Hadoop has adopted a classification scheme for audience (public, private, or limited-private) as well as stability (stable, evolving, unstable). I think we should copy these annotations to HBase and start to classify our public classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5454) Refuse operations from Admin befor master is initialized
[ https://issues.apache.org/jira/browse/HBASE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214144#comment-13214144 ] stack commented on HBASE-5454: -- This patch is a good idea Chunhui. I don't think you need to have this message Master is not initialized on an exception whose type is: MasterNotInitializedException. It seems redundant. Remove this line: + * Copyright 2007 The Apache Software Foundation Maybe you don't need these two methods? + public MasterNotInitializedException(String s) { +super(s); + } + + /** + * Constructor taking another exception. + * + * @param e Exception to grab data from. + */ + public MasterNotInitializedException(Exception e) { Refuse operations from Admin befor master is initialized Key: HBASE-5454 URL: https://issues.apache.org/jira/browse/HBASE-5454 Project: HBase Issue Type: Improvement Reporter: chunhui shen Attachments: hbase-5454.patch In our testing environment, When master is initializing, we found conflict problems between master#assignAllUserRegions and EnableTable event, causing assigning region throw exception so that master abort itself. We think we'd better refuse operations from Admin, such as CreateTable, EnableTable,etc, It could reduce error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request
[ https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214145#comment-13214145 ] stack commented on HBASE-5434: -- I added you Mubarak. Try editing wiki (I'm in process of applying this patch). [REST] Include more metrics in cluster status request - Key: HBASE-5434 URL: https://issues.apache.org/jira/browse/HBASE-5434 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5434.trunk.v1.patch /status/cluster shows only {code} stores=2 storefiless=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 {code} for a region but master web-ui shows {code} stores=1, storefiles=0, storefileUncompressedSizeMB=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 readRequestsCount=0 writeRequestsCount=0 rootIndexSizeKB=0 totalStaticIndexSizeKB=0 totalStaticBloomSizeKB=0 totalCompactingKVs=0 currentCompactedKVs=0 compactionProgressPct=NaN {code} In a write-heavy REST gateway based production environment, ops team needs to verify whether write counters are getting incremented per region (they do run /status/cluster on each REST server), we can get the same values from *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but some home-grown tools needs to parse the output of /status/cluster and updates the dashboard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()
[ https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214302#comment-13214302 ] stack commented on HBASE-5424: -- You need this on 0.90 branch too Zhiyuan? If so, add 0.90.7 as fix version. (We should also though do as Lars suggests; we're just band-aiding dealing w/ the NPE). HTable meet NPE when call getRegionInfo() - Key: HBASE-5424 URL: https://issues.apache.org/jira/browse/HBASE-5424 Project: HBase Issue Type: Bug Affects Versions: 0.90.1, 0.90.5 Reporter: junhua yang Attachments: 5424-v3.patch, HBASE-5424.patch, HBase-5424_v2.patch Original Estimate: 48h Remaining Estimate: 48h We meet NPE when call getRegionInfo() in testing environment. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73) at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418) This NPE also make the table.jsp can't show the region information of this table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5437) HRegionThriftServer does not start because of a bug in HbaseHandlerMetricsProxy
[ https://issues.apache.org/jira/browse/HBASE-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214303#comment-13214303 ] stack commented on HBASE-5437: -- Scott: It failed compile? See above console output: {code} [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/src/main/java/org/apache/hadoop/hbase/thrift/ThriftMetrics.java:[118,55] cannot find symbol [ERROR] symbol : variable toString [ERROR] location: class java.lang.Classcapture#444 of ? [ERROR] - [Help 1] {code} HRegionThriftServer does not start because of a bug in HbaseHandlerMetricsProxy --- Key: HBASE-5437 URL: https://issues.apache.org/jira/browse/HBASE-5437 Project: HBase Issue Type: Bug Components: metrics, thrift Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.94.0 Attachments: HBASE-5437.D1857.1.patch, HBASE-5437.D1887.1.patch 3.facebook.com,60020,1329865516120: Initialization of RS failed. Hence aborting RS. java.lang.ClassCastException: $Proxy9 cannot be cast to org.apache.hadoop.hbase.thrift.generated.Hbase$Iface at org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.newInstance(HbaseHandlerMetricsProxy.java:47) at org.apache.hadoop.hbase.thrift.ThriftServerRunner.init(ThriftServerRunner.java:239) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 15:05:18,749 FATAL org.apache.hadoop.h -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()
[ https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214309#comment-13214309 ] stack commented on HBASE-5424: -- I reverted the patch. Too many new failures in hadoopqa. Let me retry it. HTable meet NPE when call getRegionInfo() - Key: HBASE-5424 URL: https://issues.apache.org/jira/browse/HBASE-5424 Project: HBase Issue Type: Bug Affects Versions: 0.90.1, 0.90.5 Reporter: junhua yang Attachments: 5424-v3.patch, 5424-v3.patch, HBASE-5424.patch, HBase-5424_v2.patch Original Estimate: 48h Remaining Estimate: 48h We meet NPE when call getRegionInfo() in testing environment. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73) at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418) This NPE also make the table.jsp can't show the region information of this table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5454) Refuse operations from Admin befor master is initialized
[ https://issues.apache.org/jira/browse/HBASE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214314#comment-13214314 ] stack commented on HBASE-5454: -- Chunhui Want to do that other stuff in a different issue? This one is nice and simple as is if you make the changes suggested I can commit. Refuse operations from Admin befor master is initialized Key: HBASE-5454 URL: https://issues.apache.org/jira/browse/HBASE-5454 Project: HBase Issue Type: Improvement Reporter: chunhui shen Attachments: hbase-5454.patch In our testing environment, When master is initializing, we found conflict problems between master#assignAllUserRegions and EnableTable event, causing assigning region throw exception so that master abort itself. We think we'd better refuse operations from Admin, such as CreateTable, EnableTable,etc, It could reduce error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214325#comment-13214325 ] stack commented on HBASE-4991: -- bq. Shell command needs to be changed as delete_region table_name start_key end_key delete_region would go well w/ our current close_region. Do you need tablename, startkey, endkey? Can't you just pass region name? ditto for the deleteRegion call (though maybe I'm missing the fact that you are trying to respect Todd's comments above that we not have region come up out of the API -- ignore this remark if so). bq. If start/end key for the specified table is spanned across multiple regions then it is out of scope of this JIRA (throw error). So, you can only do one region at a time? Why would it be hard doing multiple given you are tracking? Or is it that it makes the tracking yet more complicated? Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214332#comment-13214332 ] stack commented on HBASE-4991: -- bq. Do you need tablename, startkey, endkey? Can't you just pass region name? Again, you may be trying to not reveal region type in API but then delete_region would be different to close_region which takes a regionname IIRC? bq. ..but client does not know the tableName for a regionName in our case... It does not know? I may be missing context. If I am, ignore this comment. If it did know, could use: 'ListHRegionInfo getOnlineRegions() ' HRIs have tablename. Could figure it client-side. You can't use the ListHRegion because we can't serialize HRegion to pass over connection... that call is if you are running in same context in JSP or in a unit test or something. bq. Design choice is like HBASE-4213, meaning master create a znode under zookeeper.znode.parent/delete-region Fair enough. Will we have a new dir in zk per cluster region operation we want to do? Can we not exploit primitives added by hbase-4213? Or do we need to refactor hbase-4213 to get you primitives you need to do this facility? Or is there nothing in common w/ what hbase-4213 does (there is at least the closing of a region?) bq. ...If we are considering delete_region as a tool/util then we can refactor as a tool/util as like Online/Offline merge code online merge should have a bunch of overlap w/ this feature? Would be great if they could share a bunch of code/primitives. As has been suggested, rather than a /delete-region, instead we'd have a log of intent+log of actions thing up in zk I suppose. The log of intent would list the steps to be done and then the log of actions thingy would log how far the operation had gone (I should read up on the cited accumulo doo-hickey). bq. We do put all our ZK trackers in zookeeper package and this is how online schema change HBASE-4213 was implemented. Thats a bit broken in my opinion. Its wonky having zk have reference out to other main packages. Not your fault. Should have caught that in review of hbase-4213. Good on you Mubarak. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5460) Add protobuf as M/R dependency jar
[ https://issues.apache.org/jira/browse/HBASE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214334#comment-13214334 ] stack commented on HBASE-5460: -- +1 Add protobuf as M/R dependency jar -- Key: HBASE-5460 URL: https://issues.apache.org/jira/browse/HBASE-5460 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5460.txt Getting this from M/R jobs (Export for example): Error: java.lang.ClassNotFoundException: com.google.protobuf.Message at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at org.apache.hadoop.hbase.io.HbaseObjectWritable.clinit(HbaseObjectWritable.java:262) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable
[ https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214352#comment-13214352 ] stack commented on HBASE-5455: -- Good idea. Add test to avoid unintentional reordering of items in HbaseObjectWritable -- Key: HBASE-5455 URL: https://issues.apache.org/jira/browse/HBASE-5455 Project: HBase Issue Type: Test Reporter: Michael Drzal Priority: Minor Fix For: 0.94.0 HbaseObjectWritable has a static initialization block that assigns ints to various classes. The int is assigned by using a local variable that is incremented after each use. If someone adds a line in the middle of the block, this throws off everything after the change, and can break client compatibility. There is already a comment to not add/remove lines at the beginning of this block. It might make sense to have a test against a static set of ids. If something gets changed unintentionally, it would at least fail the tests. If the change was intentional, at the very least the test would need to get updated, and it would be a conscious decision. https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one issue of this type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request
[ https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214357#comment-13214357 ] stack commented on HBASE-5434: -- It fails on hadoopqa too... {code} -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.rest.model.TestStorageClusterStatusModel {code} Does it fail for you Mubarak? [REST] Include more metrics in cluster status request - Key: HBASE-5434 URL: https://issues.apache.org/jira/browse/HBASE-5434 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5434.trunk.v1.patch /status/cluster shows only {code} stores=2 storefiless=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 {code} for a region but master web-ui shows {code} stores=1, storefiles=0, storefileUncompressedSizeMB=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 readRequestsCount=0 writeRequestsCount=0 rootIndexSizeKB=0 totalStaticIndexSizeKB=0 totalStaticBloomSizeKB=0 totalCompactingKVs=0 currentCompactedKVs=0 compactionProgressPct=NaN {code} In a write-heavy REST gateway based production environment, ops team needs to verify whether write counters are getting incremented per region (they do run /status/cluster on each REST server), we can get the same values from *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but some home-grown tools needs to parse the output of /status/cluster and updates the dashboard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3149) Make flush decisions per column family
[ https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214372#comment-13214372 ] stack commented on HBASE-3149: -- @Nicolas I think I follow. I opened HBASE-5461. Let me try it. bq. Why is this silly? Because I was seeing a plethora of small files a problem but given your explaination above, I think I grok that its not many small files thats the prob; its that w/ the way high min size, our selection was to inclusionary and so we end up doing loads of rewriting. Make flush decisions per column family -- Key: HBASE-3149 URL: https://issues.apache.org/jira/browse/HBASE-3149 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan Assignee: Nicolas Spiegelberg Priority: Critical Fix For: 0.92.1 Today, the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5437) HRegionThriftServer does not start because of a bug in HbaseHandlerMetricsProxy
[ https://issues.apache.org/jira/browse/HBASE-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214857#comment-13214857 ] stack commented on HBASE-5437: -- +1 HRegionThriftServer does not start because of a bug in HbaseHandlerMetricsProxy --- Key: HBASE-5437 URL: https://issues.apache.org/jira/browse/HBASE-5437 Project: HBase Issue Type: Bug Components: metrics, thrift Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.94.0 Attachments: HBASE-5437.D1857.1.patch, HBASE-5437.D1887.1.patch, HBASE-5437.D1887.2.patch 3.facebook.com,60020,1329865516120: Initialization of RS failed. Hence aborting RS. java.lang.ClassCastException: $Proxy9 cannot be cast to org.apache.hadoop.hbase.thrift.generated.Hbase$Iface at org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.newInstance(HbaseHandlerMetricsProxy.java:47) at org.apache.hadoop.hbase.thrift.ThriftServerRunner.init(ThriftServerRunner.java:239) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 15:05:18,749 FATAL org.apache.hadoop.h -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5442) Use builder pattern in StoreFile and HFile
[ https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214868#comment-13214868 ] stack commented on HBASE-5442: -- @Mikhail Thats the usual set of three that fail on hadoopqa, fyi. Use builder pattern in StoreFile and HFile -- Key: HBASE-5442 URL: https://issues.apache.org/jira/browse/HBASE-5442 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1893.1.patch, D1893.2.patch, HFile-StoreFile-builder-2012-02-22_22_49_00.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses StoreFile and HFile refactoring. For HColumnDescriptor refactoring see HBASE-5357. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5454) Refuse operations from Admin befor master is initialized
[ https://issues.apache.org/jira/browse/HBASE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214870#comment-13214870 ] stack commented on HBASE-5454: -- So, you want to mash this patch into hbase-5270? If so, close this one as won't fix? Refuse operations from Admin befor master is initialized Key: HBASE-5454 URL: https://issues.apache.org/jira/browse/HBASE-5454 Project: HBase Issue Type: Improvement Reporter: chunhui shen Attachments: hbase-5454.patch In our testing environment, When master is initializing, we found conflict problems between master#assignAllUserRegions and EnableTable event, causing assigning region throw exception so that master abort itself. We think we'd better refuse operations from Admin, such as CreateTable, EnableTable,etc, It could reduce error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5466) Opening a table also opens the metatable and never closes it.
[ https://issues.apache.org/jira/browse/HBASE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215137#comment-13215137 ] stack commented on HBASE-5466: -- +1 on patch (except for the spacing that is not like the rest of the file) Opening a table also opens the metatable and never closes it. - Key: HBASE-5466 URL: https://issues.apache.org/jira/browse/HBASE-5466 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.5, 0.92.0 Reporter: Ashley Taylor Attachments: MetaScanner_HBASE_5466(2).patch, MetaScanner_HBASE_5466.patch Having upgraded to CDH3U3 version of hbase we found we had a zookeeper connection leak, tracking it down we found that closing the connection will only close the zookeeper connection if all calls to get the connection have been closed, there is incCount and decCount in the HConnection class, When a table is opened it makes a call to the metascanner class which opens a connection to the meta table, this table never gets closed. This caused the count in the HConnection class to never return to zero meaning that the zookeeper connection will not close when we close all the tables or calling HConnectionManager.deleteConnection(config, true); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5415) FSTableDescriptors should handle random folders in hbase.root.dir better
[ https://issues.apache.org/jira/browse/HBASE-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215164#comment-13215164 ] stack commented on HBASE-5415: -- Whats difference between miscellaneous dirs under hbase.rootdir and an actual table directory that is missing its .tableinfo file? We're changing our API when we remove TEE from public methods? FSTableDescriptors should handle random folders in hbase.root.dir better Key: HBASE-5415 URL: https://issues.apache.org/jira/browse/HBASE-5415 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.1, 0.94.0 Attachments: HBASE-5415.patch I faked an upgrade on a test cluster using our dev data so I had to distcp the data between the two clusters, but after starting up and doing the migration and whatnot the web UI didn't show any table. The reason was in the master's log: {quote} org.apache.hadoop.hbase.TableExistsException: No descriptor for _distcp_logs_e0ehek at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:164) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:182) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326) {quote} I don't think we need to show a full stack (just a WARN maybe), this shouldn't kill the request (still see tables in the web UI), and why is that a TableExistsException? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size
[ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215176#comment-13215176 ] stack commented on HBASE-4365: -- @Lars You want to put an upper bound on the number of regions? I think if we do power of three, we'll lose some of the benefit J-D sees above; we'll fan out the regions slower. Do you want to put an upper bound on the number of regions per regionserver for a table? Say, three? As in, when we get to three regions on a server, just scoot the split size up to the maximum. So, given a power of two, we'd split on first flush, then the next split would happen at (2*2*128M) 512M, then 9*128M=1.2G and thereafter we'd split at the max, say 10G? Or should we just commit this for now and do more in another patch? Add a decent heuristic for region size -- Key: HBASE-4365 URL: https://issues.apache.org/jira/browse/HBASE-4365 Project: HBase Issue Type: Improvement Affects Versions: 0.92.1, 0.94.0 Reporter: Todd Lipcon Priority: Critical Labels: usability Attachments: 4365-v2.txt, 4365.txt A few of us were brainstorming this morning about what the default region size should be. There were a few general points made: - in some ways it's better to be too-large than too-small, since you can always split a table further, but you can't merge regions currently - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large regions (10GB+) - for small tables you may want a small region size just so you can distribute load better across a cluster - for big tables, multi-GB is probably best -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5349) Automagically tweak global memstore and block cache sizes based on workload
[ https://issues.apache.org/jira/browse/HBASE-5349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215220#comment-13215220 ] stack commented on HBASE-5349: -- Chatting w/ J-D about a phenomenon where we do not use memory when we are taking on a bunch of writes w/ a low region count. The few regions we have grow to their max of 128M or so and then we flush but in his case he had gigs of free memory still. The notion is that we should let memstores grow to fill all available space and then flush when they hit the low-water global mem mark for the memstore. The problem then becomes we'll flush lots of massive files and will overwhelm compactions. We'll need a push-back, something like a flush-merge where we flush by rewriting an existing store file interleaving the contents of memory or some such to slow down the flush but also to make for less compaction to do. Automagically tweak global memstore and block cache sizes based on workload --- Key: HBASE-5349 URL: https://issues.apache.org/jira/browse/HBASE-5349 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Fix For: 0.94.0 Hypertable does a neat thing where it changes the size given to the CellCache (our MemStores) and Block Cache based on the workload. If you need an image, scroll down at the bottom of this link: http://www.hypertable.com/documentation/architecture/ That'd be one less thing to configure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5415) FSTableDescriptors should handle random folders in hbase.root.dir better
[ https://issues.apache.org/jira/browse/HBASE-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215236#comment-13215236 ] stack commented on HBASE-5415: -- bq. Former's HTD is null, latter gets a FNFE. I still don't understand how we can tell the different between a misc directory in wrong place and a table directory missing its .tableinfo. Both would look the same to the interrogating code I'd think? bq. Technically no, TEE (and FNFE FWIW) are both IOEs so there's no change there. I removed TEE specifically because it isn't thrown anymore. I mean, if I had client code that had a catch of a TEE, it'd stop working, right? (I'd doubt such a thing exists so I'm not too bad on removing this) FSTableDescriptors should handle random folders in hbase.root.dir better Key: HBASE-5415 URL: https://issues.apache.org/jira/browse/HBASE-5415 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.1, 0.94.0 Attachments: HBASE-5415.patch I faked an upgrade on a test cluster using our dev data so I had to distcp the data between the two clusters, but after starting up and doing the migration and whatnot the web UI didn't show any table. The reason was in the master's log: {quote} org.apache.hadoop.hbase.TableExistsException: No descriptor for _distcp_logs_e0ehek at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:164) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:182) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326) {quote} I don't think we need to show a full stack (just a WARN maybe), this shouldn't kill the request (still see tables in the web UI), and why is that a TableExistsException? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215239#comment-13215239 ] stack commented on HBASE-4991: -- bq. Is it Okay to do the above in another JIRA ? As a prereq for this issue? Yes. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215258#comment-13215258 ] stack commented on HBASE-4991: -- bq. i have tried with getting a ListHRegion, got into serialization issue Yeah. HRegion is not a Writable bq. We are using znode just to start the task and update the state only. If we keep track of intent vs action in same znode, considering the size of data in znode, we should not exceed 1 MB as ZK admin guide says Oh, you are talking of writing actual data into zk? I was just talking of intent, a bare mini language that outlines steps to complete an operation... something like your enums. I'd think this would be well under 1MB. Good stuff. Might make sense to work on a bit of a design doc first? Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5415) FSTableDescriptors should handle random folders in hbase.root.dir better
[ https://issues.apache.org/jira/browse/HBASE-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215259#comment-13215259 ] stack commented on HBASE-5415: -- +1 on commit FSTableDescriptors should handle random folders in hbase.root.dir better Key: HBASE-5415 URL: https://issues.apache.org/jira/browse/HBASE-5415 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.1, 0.94.0 Attachments: HBASE-5415.patch I faked an upgrade on a test cluster using our dev data so I had to distcp the data between the two clusters, but after starting up and doing the migration and whatnot the web UI didn't show any table. The reason was in the master's log: {quote} org.apache.hadoop.hbase.TableExistsException: No descriptor for _distcp_logs_e0ehek at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:164) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:182) at org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326) {quote} I don't think we need to show a full stack (just a WARN maybe), this shouldn't kill the request (still see tables in the web UI), and why is that a TableExistsException? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215277#comment-13215277 ] stack commented on HBASE-3909: -- I'm now suggesting we hoist the differences only up into zk. We'd have a configuration directory under /hbase in zk. It would have znodes whose names are the config to change. The content of the znode is the new value (and type I suppose). Once a znode is added under configuration dir, watchers are triggered and they update their running Configuration instance. We do some refactoring in HRegionServers and HMaster so important configs go back to their Configuration instance at critical junctures such as at split or checking if should do a compaction or if should flush, rather than read a data member that was set on Construction (We'd be careful to not do lookup on Configuration always). We'd add a configure to the shell that allowed you hoist configs up into zk. We'd punt on there being a connection between this mechanism and whats in hbase-*xml. This facility is for 'ephemeral' configuration, for getting you over a temporary hump, for trying out a setting to see its effect, or to get you out of a fix; e.g. cluster is up and running but you forgot to set a critical config. all w/o need of a rolling restart/restart. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.94.0 I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5466) Opening a table also opens the metatable and never closes it.
[ https://issues.apache.org/jira/browse/HBASE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215279#comment-13215279 ] stack commented on HBASE-5466: -- @Ted Yes please. Opening a table also opens the metatable and never closes it. - Key: HBASE-5466 URL: https://issues.apache.org/jira/browse/HBASE-5466 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.5, 0.92.0 Reporter: Ashley Taylor Attachments: MetaScanner_HBASE_5466(2).patch, MetaScanner_HBASE_5466(3).patch, MetaScanner_HBASE_5466.patch Having upgraded to CDH3U3 version of hbase we found we had a zookeeper connection leak, tracking it down we found that closing the connection will only close the zookeeper connection if all calls to get the connection have been closed, there is incCount and decCount in the HConnection class, When a table is opened it makes a call to the metascanner class which opens a connection to the meta table, this table never gets closed. This caused the count in the HConnection class to never return to zero meaning that the zookeeper connection will not close when we close all the tables or calling HConnectionManager.deleteConnection(config, true); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215284#comment-13215284 ] stack commented on HBASE-3909: -- Reading over hadoop-7001, Phillip says Not to mention that Configuration objects get copied along, so it's hard to make sure that a configuration change propagates to all possible children. I need to survey to make sure callback context can change a Configuration instance that is used in all the important places we'd want to change. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.94.0 I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215346#comment-13215346 ] stack commented on HBASE-3909: -- bq. The assumption is there wouldn't be many such config items to change. We should survey and validate this assumption. You could do hundreds or even put every config. up there if you wanted. Should be fine. bq. When would these znodes be deleted ? Not sure. Good question. Their insertion would trigger the callback so they'd be useless after putting. Could let them just expire. Might be good to keep them around though so could get an idea of what was changed via zk. Need to think on it. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.94.0 I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs
[ https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215378#comment-13215378 ] stack commented on HBASE-5451: -- Excellent! Minor, would do: if (!head.hasUserInfo()) return; .. Then you'd save an indent of the whole body of the method. Seems like ticket should be renamed user (we seem to be creating a user rather than a ticket?) here -- I like the way you ask user to create passing the header: - ticket = header.getUser(); + ticket = User.create(header); Is ConnectionContext actually the headers? Should it be called ConnectionHeader? What is this -- HBaseCompleteRpcRequestProto? Its 'The complete RPC request message'. Its the callid and the client request. Is it the complete request because its missing the header? Should it just be called Request since its inside a package that makes its provinence clear? I suppose request would be odd because you then do getRequest on it... hmm. Why tunnelRequest. Whats that mean? I like the builder stuff making headers and request over in client. Fatten doc on the proto file I'd say. Its going to be our spec. Can these proto classes drop the HBaseRPC prefix? Is the Proto suffix going to be our convention denoting Proto classes going forward? Are we doing to repeat the hrpc exception handling carrying Strings for exceptions from server to client? Switch RPC call envelope/headers to PBs --- Key: HBASE-5451 URL: https://issues.apache.org/jira/browse/HBASE-5451 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Devaraj Das Attachments: rpc-proto.patch.1_2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5440) Allow import to optionally use HFileOutputFormat
[ https://issues.apache.org/jira/browse/HBASE-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215388#comment-13215388 ] stack commented on HBASE-5440: -- LGTM. Whats missing is better documentation in the usage for Import. This new option will be under a rock unless its better surfaced. +1 on commit after beefing up usage. Add some lines under here: {code} -System.err.println(Usage: Import tablename inputdir); +System.err.println(Usage: Import [-D + BULK_OUTPUT_CONF_KEY ++ =/path/for/output] tablename inputdir); {code} ... going on about what the -D thingy does. Good stuff. Allow import to optionally use HFileOutputFormat Key: HBASE-5440 URL: https://issues.apache.org/jira/browse/HBASE-5440 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Attachments: 5440.txt importtsv support importing into a life table or to generate HFiles for bulk load. import should allow the same. Could even consider merging these tools into one (in principle the only difference is the parsing part - although that is maybe for a different jira). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215432#comment-13215432 ] stack commented on HBASE-5166: -- @Jai Its not you. Those are known failing tests. Let me commit. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215431#comment-13215431 ] stack commented on HBASE-4991: -- bq. Do we need to design the intent part (and steps of an operation) as a generic framework for all the master-coordinated tasks? I'd think you could make it work for you but make it so it could be others. How does Accumulo do it do you know? You might get some ideas over there. bq. I thought we are only changing the API (with multiple region support) and focussing more on refactoring with good test/stress-test in this JIRA. You mean to get this facility into core? My sense is that you could get this specialized lump into hbase to do this one facility if lots of tests but my fear is that if it does go in, it'll live forever as an awkward appendage. Seems like we have an opportunity to add some base primitives that we can then build this feature on as well as others. Pity to waste it (understood if you don't want to do the generalized system). bq. Can we address intent/actions part out of scope of this JIRA? I'm reluctant to because of the above -- we'll get a specialized lump of code that will live forever and we'll all be afraid to touch. Maybe others think different. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215442#comment-13215442 ] stack commented on HBASE-5317: -- It can go into 0.92 if you make a version (I see a bunch of failures trying to apply trunk patch). Thanks Gregory. Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, HBASE-5317-v3.patch, HBASE-5317-v4.patch, HBASE-5317-v5.patch, HBASE-5317-v6.patch, TEST-org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.xml Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3484) Replace memstore's ConcurrentSkipListMap with our own implementation
[ https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215446#comment-13215446 ] stack commented on HBASE-3484: -- Great stuff Todd. bq. ...copy-on-write sorted array lists. Could we do this? We'd allocate a new array everytime we did an insert? An array would be cheaper space wise and more efficient scanning, etc., I'd think It'd just be the insert and sort that'd be 'expensive'. Let me have a go at your suggested microbenchmark. Replace memstore's ConcurrentSkipListMap with our own implementation Key: HBASE-3484 URL: https://issues.apache.org/jira/browse/HBASE-3484 Project: HBase Issue Type: Improvement Components: performance Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Attachments: hierarchical-map.txt By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements to it for our use case in MemStore: - add an iterator.replace() method which should allow us to do upsert much more cheaply - implement a Set directly without having to do MapKeyValue,KeyValue to save one reference per entry It turns out CSLM is in public domain from its development as part of JSR 166, so we should be OK with licenses. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215796#comment-13215796 ] stack commented on HBASE-5075: -- bq. Even something as simple as just removing your own znode on failure would be sufficient to cover this use case, correct? Lets do that regardless. Good idea. regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215800#comment-13215800 ] stack commented on HBASE-5351: -- @Adrian That seems like the way to go. hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MA +while (!this.hbAdmin.isTableAvailable(tableName) (ctrTABLE_CREATE_MAX_R {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable
[ https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215802#comment-13215802 ] stack commented on HBASE-5455: -- +1 Excellent Add test to avoid unintentional reordering of items in HbaseObjectWritable -- Key: HBASE-5455 URL: https://issues.apache.org/jira/browse/HBASE-5455 Project: HBase Issue Type: Test Reporter: Michael Drzal Assignee: Michael Drzal Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5455.diff HbaseObjectWritable has a static initialization block that assigns ints to various classes. The int is assigned by using a local variable that is incremented after each use. If someone adds a line in the middle of the block, this throws off everything after the change, and can break client compatibility. There is already a comment to not add/remove lines at the beginning of this block. It might make sense to have a test against a static set of ids. If something gets changed unintentionally, it would at least fail the tests. If the change was intentional, at the very least the test would need to get updated, and it would be a conscious decision. https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one issue of this type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs
[ https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215895#comment-13215895 ] stack commented on HBASE-5451: -- Ok on the tunnel thing. Maybe comment it some more (if you haven't already) in code. Yeah on suffix. We need convention I'd say distingushing the PB classes. On exception, could do as separate jira. Here is one that looks like its what you need that already exists, if it helps: HBASE-2030 Switch RPC call envelope/headers to PBs --- Key: HBASE-5451 URL: https://issues.apache.org/jira/browse/HBASE-5451 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Devaraj Das Attachments: rpc-proto.patch.1_2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1762) Remove concept of ZooKeeper from HConnection interface
[ https://issues.apache.org/jira/browse/HBASE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215946#comment-13215946 ] stack commented on HBASE-1762: -- This is being done as part of HBASE-5399 Remove concept of ZooKeeper from HConnection interface -- Key: HBASE-1762 URL: https://issues.apache.org/jira/browse/HBASE-1762 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.0 Reporter: Ken Weiner Assignee: stack Attachments: HBASE-1762.patch The concept of ZooKeeper is really an implementation detail and should not be exposed in the {{HConnection}} interface. Therefore, I suggest removing the {{HConnection.getZooKeeperWrapper()}} method from the interface. I couldn't find any uses of this method within the HBase code base except for in one of the unit tests: {{org.apache.hadoop.hbase.TestZooKeeper}}. This unit test should be changed to instantiate the implementation of {{HConnection}} directly, allowing it to use the {{getZooKeeperWrapper()}} method. This requires making {{org.apache.hadoop.hbase.client.HConnectionManager.TableServers}} public. (I actually think TableServers should be moved out into an outer class, but in the spirit of small patches, I'll refrain from suggesting that in this issue). I'll attach a patch for: # The removal of {{HConnection.getZooKeeperWrapper()}} # Change of {{TableServers}} class from private to public # Direct instantiation of {{TableServers}} within {{TestZooKeeper}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215945#comment-13215945 ] stack commented on HBASE-5399: -- Another thought: Do we have to have the getSharedZookeeperWatcher and releaseSharedZookeeperWatcher and getSharedMaster, etc., in the HConnection API? Are these not implementation details? (Or would it be too hard to undo them -- you'd have not way of counting zk and master connections?) Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215973#comment-13215973 ] stack commented on HBASE-5075: -- This issue seems to be like 'HBASE-2342 Consider adding a watchdog node next to region server' regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216019#comment-13216019 ] stack commented on HBASE-4991: -- bq. I feel some of the recent proposals / requirements are far more complex than the one Yeah. It seemed basic back in December. bq. There wasn't such requirement when Mubarak outlined his plan Pardon me. I should have noticed the plan but did not. Other priorities. If I'd seen the plan I'd have blanched I think. bq. Of course, having generic framework for all the master-coordinated tasks allows future additions to be concise. Yep. We'd have tested, proven primitives to build stuff on rather than have to do it per feature bq. But I think that should have been outlined clearly in the early stage of development of a feature. See above. Pardon me for missing how involved this addition became. I don't see how plan of ' 01/Feb/12 07:43' lays foundation for a generic framework. Am I missing something? It seems like its code for this feature only? Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216066#comment-13216066 ] stack commented on HBASE-5075: -- Rather than write a new supervisor, why not use something old school like http://supervisord.org/ A wrapper script could clear old znode from zk before restarting new RS instance? regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216078#comment-13216078 ] stack commented on HBASE-5075: -- Looking in HRegionServer code, it looks like we delete our znode on the way out already. Someone had your idea already Jesse: {code} try { deleteMyEphemeralNode(); } catch (KeeperException e) { LOG.warn(Failed deleting my ephemeral node, e); } {code} Maybe this is broke? regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216113#comment-13216113 ] stack commented on HBASE-3909: -- @Jimmy Nice thing about zk is that when config changes all get notification (Would need to make it so a new regionserver joining cluster would look into the zk /configuration dir to pick up differences). When its in fs, we'd need to poll fs to find changes? Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.94.0 I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5350) Fix jamon generated package names
[ https://issues.apache.org/jira/browse/HBASE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216147#comment-13216147 ] stack commented on HBASE-5350: -- I verified UI looks right at least in local mode (could be different up on cluster) Fix jamon generated package names - Key: HBASE-5350 URL: https://issues.apache.org/jira/browse/HBASE-5350 Project: HBase Issue Type: Bug Components: monitoring Affects Versions: 0.92.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.94.0 Attachments: jamon_HBASE-5350.patch, jamon_HBASE-5350.patch Previously, jamon was creating the template files in org.apache.hbase, but it should be org.apache.hadoop.hbase, so it's in line with rest of source files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2375) Revisit compaction configuration parameters
[ https://issues.apache.org/jira/browse/HBASE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216164#comment-13216164 ] stack commented on HBASE-2375: -- Split early has been committed too. All that remains of this issue is upping default compaction threshold. Revisit compaction configuration parameters --- Key: HBASE-2375 URL: https://issues.apache.org/jira/browse/HBASE-2375 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.20.3 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Labels: moved_from_0_20_5 Attachments: HBASE-2375-flush-split.patch, HBASE-2375-v8.patch Currently we will make the decision to split a region when a single StoreFile in a single family exceeds the maximum region size. This issue is about changing the decision to split to be based on the aggregate size of all StoreFiles in a single family (but still not aggregating across families). This would move a check to split after flushes rather than after compactions. This issue should also deal with revisiting our default values for some related configuration parameters. The motivating factor for this change comes from watching the behavior of RegionServers during heavy write scenarios. Today the default behavior goes like this: - We fill up regions, and as long as you are not under global RS heap pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) StoreFiles. - After we get 3 StoreFiles (hbase.hstore.compactionThreshold) we trigger a compaction on this region. - Compaction queues notwithstanding, this will create a 192MB file, not triggering a split based on max region size (hbase.hregion.max.filesize). - You'll then flush two more 64MB MemStores and hit the compactionThreshold and trigger a compaction. - You end up with 192 + 64 + 64 in a single compaction. This will create a single 320MB and will trigger a split. - While you are performing the compaction (which now writes out 64MB more than the split size, so is about 5X slower than the time it takes to do a single flush), you are still taking on additional writes into MemStore. - Compaction finishes, decision to split is made, region is closed. The region now has to flush whichever edits made it to MemStore while the compaction ran. This flushing, in our tests, is by far the dominating factor in how long data is unavailable during a split. We measured about 1 second to do the region closing, master assignment, reopening. Flushing could take 5-6 seconds, during which time the region is unavailable. - The daughter regions re-open on the same RS. Immediately when the StoreFiles are opened, a compaction is triggered across all of their StoreFiles because they contain references. Since we cannot currently split a split, we need to not hang on to these references for long. This described behavior is really bad because of how often we have to rewrite data onto HDFS. Imports are usually just IO bound as the RS waits to flush and compact. In the above example, the first cell to be inserted into this region ends up being written to HDFS 4 times (initial flush, first compaction w/ no split decision, second compaction w/ split decision, third compaction on daughter region). In addition, we leave a large window where we take on edits (during the second compaction of 320MB) and then must make the region unavailable as we flush it. If we increased the compactionThreshold to be 5 and determined splits based on aggregate size, the behavior becomes: - We fill up regions, and as long as you are not under global RS heap pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) StoreFiles. - After each MemStore flush, we calculate the aggregate size of all StoreFiles. We can also check the compactionThreshold. For the first three flushes, both would not hit the limit. On the fourth flush, we would see total aggregate size = 256MB and determine to make a split. - Decision to split is made, region is closed. This time, the region just has to flush out whichever edits made it to the MemStore during the snapshot/flush of the previous MemStore. So this time window has shrunk by more than 75% as it was the time to write 64MB from memory not 320MB from aggregating 5 hdfs files. This will greatly reduce the time data is unavailable during splits. - The daughter regions re-open on the same RS. Immediately when the StoreFiles are opened, a compaction is triggered across all of their StoreFiles because they contain references. This would stay the same. In this example, we only write a given cell twice (instead of 4 times) while drastically
[jira] [Commented] (HBASE-5477) Cannot build RPM for hbase-0.92.0
[ https://issues.apache.org/jira/browse/HBASE-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216295#comment-13216295 ] stack commented on HBASE-5477: -- This works for you Benjamin? LGTM. You want to file separate issue for hbase-conf-pseudo? Cannot build RPM for hbase-0.92.0 - Key: HBASE-5477 URL: https://issues.apache.org/jira/browse/HBASE-5477 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Environment: Operating system: CentOS 6.2 {code} $ java -version java version 1.6.0_22 OpenJDK Runtime Environment (IcedTea6 1.10.6) (rhel-1.43.1.10.6.el6_2-x86_64) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) {code} {code} $ mvn -v Warning: JAVA_HOME environment variable is not set. Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700) Java version: 1.6.0_22 Java home: /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux version: 2.6.32-220.el6.x86_64 arch: amd64 Family: unix {code} Reporter: Benjamin Lee Attachments: build.log, hbase-0.92.0.patch Steps to reproduce: {code} tar xzvf hbase-0.92.0.tar.gz cd hbase-0.92.0 mvn -Dmaven.test.skip.exec=true -P rpm install {code} Failure output and patch will be attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5477) Cannot build RPM for hbase-0.92.0
[ https://issues.apache.org/jira/browse/HBASE-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216294#comment-13216294 ] stack commented on HBASE-5477: -- This works for you Benjamin? LGTM. You want to file separate issue for hbase-conf-pseudo? Cannot build RPM for hbase-0.92.0 - Key: HBASE-5477 URL: https://issues.apache.org/jira/browse/HBASE-5477 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Environment: Operating system: CentOS 6.2 {code} $ java -version java version 1.6.0_22 OpenJDK Runtime Environment (IcedTea6 1.10.6) (rhel-1.43.1.10.6.el6_2-x86_64) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) {code} {code} $ mvn -v Warning: JAVA_HOME environment variable is not set. Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700) Java version: 1.6.0_22 Java home: /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux version: 2.6.32-220.el6.x86_64 arch: amd64 Family: unix {code} Reporter: Benjamin Lee Attachments: build.log, hbase-0.92.0.patch Steps to reproduce: {code} tar xzvf hbase-0.92.0.tar.gz cd hbase-0.92.0 mvn -Dmaven.test.skip.exec=true -P rpm install {code} Failure output and patch will be attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5364) Fix source files missing licenses in 0.92 and trunk
[ https://issues.apache.org/jira/browse/HBASE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216299#comment-13216299 ] stack commented on HBASE-5364: -- I applied Shaneal's addendum to 0.90 branch. Thanks for the cleanup Shaneal Fix source files missing licenses in 0.92 and trunk --- Key: HBASE-5364 URL: https://issues.apache.org/jira/browse/HBASE-5364 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Elliott Clark Priority: Blocker Fix For: 0.92.1, 0.94.0 Attachments: HBASE-5364-1.patch, hbase-5364-0.90.patch, hbase-5364-0.92.patch, hbase-5364-v2.patch running 'mvn rat:check' shows that a few files have snuck in that do not have proper apache licenses. Ideally we should fix these before we cut another release/release candidate. This is a blocker for 0.94, and probably should be for the other branches as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2462) Review compaction heuristic and move compaction code out so standalone and independently testable
[ https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216581#comment-13216581 ] stack commented on HBASE-2462: -- It likes a bunch of this has made it in but I don't see the standalone compactions part nor the simulator. I'm taking a look at salvaging these latter two aspects from this patch and at least making it so we have standalone compactions (I want to look at compactions in isolation to see if we can make them run faster; we also need to work on making it so we do less of them but thats other issues). Review compaction heuristic and move compaction code out so standalone and independently testable - Key: HBASE-2462 URL: https://issues.apache.org/jira/browse/HBASE-2462 Project: HBase Issue Type: Improvement Components: performance Reporter: stack Assignee: Jonathan Gray Priority: Critical Labels: moved_from_0_20_5 Anything that improves our i/o profile makes hbase run smoother. Over in HBASE-2457, good work has been done already describing the tension between minimizing compactions versus minimizing count of store files. This issue is about following on from what has been done in 2457 but also, breaking the hard-to-read compaction code out of Store.java out to a standalone class that can be the easier tested (and easily analyzed for its performance characteristics). If possible, in the refactor, we'd allow specification of alternate merge sort implementations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time
[ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216608#comment-13216608 ] stack commented on HBASE-5479: -- Todd suggests something like a scoring over here Matt: https://issues.apache.org/jira/browse/HBASE-2457?focusedCommentId=12857705page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12857705 Lets verify that we do indeed do selection at queuing time. Thats my suspicion. If thats the case, for sure needs fixing. Thanks for filing this one Matt. Postpone CompactionSelection to compaction execution time - Key: HBASE-5479 URL: https://issues.apache.org/jira/browse/HBASE-5479 Project: HBase Issue Type: New Feature Components: io, performance, regionserver Reporter: Matt Corgan It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created. The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection. The CompactionSelection should be created at compaction execution time rather than compaction request time. The current mechanism breaks down during high volume insertion. The inefficiency is clearest when the inserts are finished. Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue. When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old. This ends up re-compacting the same data many times. The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap. With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters. The only goal should be to reduce file count, not to minimize number of files merged for each read. There are other aspects to the current queuing mechanism that would need to be looked at. You would want to avoid having the same Store in the queue multiple times. And you would want the completion of one compaction to possibly queue another compaction request for the store. A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc. Then you create a CompactionPriorityComparator implements ComparatorStore and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes). The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority X. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
[ https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216861#comment-13216861 ] stack commented on HBASE-5480: -- +1 Looks grand Andy. Reflection is per map invocation? So, per row? I suppose in scheme of things not too bad. Fixups to MultithreadedTableMapper for Hadoop 0.23.2+ - Key: HBASE-5480 URL: https://issues.apache.org/jira/browse/HBASE-5480 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Andrew Purtell Priority: Critical Attachments: HBASE-5480.patch There are two issues: - StatusReporter has a new method getProgress() - Mapper and reducer context objects can no longer be directly instantiated. See attached patch. I'm not thrilled with the added reflection but it was the minimally intrusive change. Raised the priority to critical because compilation fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216863#comment-13216863 ] stack commented on HBASE-5075: -- @zhiyuan.dai What you think of the idea of using supervisor or any of the other babysitting programs instead of writing our own from new? If you need to have hbase regionservers dump out their servername so you know what to kill up in zk, that can be done easy enough regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217264#comment-13217264 ] stack commented on HBASE-5075: -- bq. Do you means another project instead of writing code into hbase? Yes sir. Process babysitters is a pretty mature domain w/ a wide variety of existing programs that have been debugged and are able to do this for you. What do you think about using one of the existing solutions rather than write your own? regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217267#comment-13217267 ] stack commented on HBASE-5399: -- Ditto w/ zk? Can't we just add close to the HConnection Interface and it will decrement the ref counting? Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217266#comment-13217266 ] stack commented on HBASE-5399: -- bq. ...but I didn't find an easy way to extend the master proxy to make it closeable What is the issue w/ the above? (I wonder why its hard to do?) Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217293#comment-13217293 ] stack commented on HBASE-5399: -- bq. For HMasterInterface, I don't know: I need to modify the interface but also HBaseRPC.getProxy and then VersionedProtocol and so on, no? to add the close? (I am not following closely but would like to understand if possible so throw me a clue or two on what issue is). Thanks N. Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217311#comment-13217311 ] stack commented on HBASE-5399: -- bq... If I don't want to do that, I need to add the method in the object returned by getProxy. You think it makes sense? How would that work (I've wanted to add a method to the returned proxy in the past). Would you have returned proxy implement another Interface (That sounds hard). Make the returned Interface implement Closeable? Or, even, whats wrong w/ the close going remote? Maybe there are resources master-side to clean up (if not now, maybe one day?... though yeah, if client doesn't have to make the RPC, lets not bother if possible). Sounds like something to try and figure -- if possible (Of course I've no ideas?) BTW, what you have above for conenction w/ try/finally looks ideal Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217347#comment-13217347 ] stack commented on HBASE-5270: -- Do you think we should check to see if we have already split this server's log for the case where the server was carrying root and meta? {code} + splitLogIfOnline(currentMetaServer); {code} Or will the above call become a noop because we just split it before we assignedroot? Is this a 'safe mode' or is it the master 'initializing'? I think 'safe mode' makes folks think of hdfs. It is a little similar in that master is trying to make sense of the cluster but initializing might be a better name for this state. BTW, I think this is an improvement over previous versions of this patch. Its easier to reason about. Good stuff Chunhui. Make a method and put this duplicated code into it and call it from the two places its repeated: {code} +if (!deadNotExpiredServers.isEmpty()) { + for (final ServerName server : deadNotExpiredServers) { +LOG.debug(Removing dead but not expired server: + server ++ from eligible server pool.); +servers.remove(server); + } +} {code} Fix this bit of javadoc '... but not are expired now.' You don't need this: {code} + * Copyright 2007 The Apache Software Foundation {code} I think MasterInSafeModeException becomes MasterInitializingException? Good stuff Chunhui Regards Jimmy's comment: bq. Instead of introducing safe mode, can we add something to the RPC server and don't allow it to sever traffic before the actual server is ready, for example, fully initialized? We have a ServerNotRunningYetException down in the ipc. Its thrown by HBaseServer if RPC has not started yet. It seems a little related to this MasterInitializing. We also have a PleaseHoldException. Perhaps the Master should throw this instead of the MasterInitializing? We'd throw a PleaseHoldException and the message would be detail that the master is initializing? Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was
[jira] [Commented] (HBASE-5460) Add protobuf as M/R dependency jar
[ https://issues.apache.org/jira/browse/HBASE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217537#comment-13217537 ] stack commented on HBASE-5460: -- We said something about 24 hour window for addendums... else its hard for the fellows following behind us to figure what happened... that means you should do a new issue. Add protobuf as M/R dependency jar -- Key: HBASE-5460 URL: https://issues.apache.org/jira/browse/HBASE-5460 Project: HBase Issue Type: Sub-task Components: mapreduce Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5460.txt Getting this from M/R jobs (Export for example): Error: java.lang.ClassNotFoundException: com.google.protobuf.Message at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at org.apache.hadoop.hbase.io.HbaseObjectWritable.clinit(HbaseObjectWritable.java:262) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217728#comment-13217728 ] stack commented on HBASE-5074: -- I see these in the logs when I run the patch; its a little odd because it says not using PureJavaCrc32 but will use CRC32 but then prints out stacktrace anyways: {code} 2012-02-27 23:34:20,911 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: TestTable,150828,1330380684339.ebb37d5d0e2c1f4a8b111830a46e7cbc. 2012-02-27 23:34:20,914 INFO org.apache.hadoop.hbase.regionserver.Store: time to purge deletes set to 0ms in store null 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available. 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: Checksum using java.util.zip.CRC32 2012-02-27 23:34:20,931 WARN org.apache.hadoop.hbase.util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available. java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.util.PureJavaCrc32C at org.apache.hadoop.hbase.util.ChecksumFactory.newConstructor(ChecksumFactory.java:65) at org.apache.hadoop.hbase.util.ChecksumType$3.initialize(ChecksumType.java:113) at org.apache.hadoop.hbase.util.ChecksumType.init(ChecksumType.java:148) at org.apache.hadoop.hbase.util.ChecksumType.init(ChecksumType.java:37) at org.apache.hadoop.hbase.util.ChecksumType$3.init(ChecksumType.java:100) at org.apache.hadoop.hbase.util.ChecksumType.clinit(ChecksumType.java:100) at org.apache.hadoop.hbase.io.hfile.HFile.clinit(HFile.java:163) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1252) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:516) at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:606) at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:375) at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:370) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PureJavaCrc32C at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.hbase.util.ChecksumFactory.getClassByName(ChecksumFactory.java:97) at org.apache.hadoop.hbase.util.ChecksumFactory.newConstructor(ChecksumFactory.java:60) ... 19 more {code} I'm not sure on whats happening. It would seem we're using default CRC32 but then I'm not sure how I get the above exception reading code. Also, not sure if I have this facility turned on. Its on by default but I don't see anything in logs saying its on (and I don't have metrics on this cluster, nor do I have a good handle on before and after regards whether this feature makes a difference). I caught this in a heap dump: {code} IPC Server handler 0 on 7003 daemon prio=10 tid=0x7f4a1410c800 nid=0x24b2 runnable [0x7f4a20487000] java.lang.Thread.State: RUNNABLE at java.util.zip.CRC32.updateBytes(Native Method) at java.util.zip.CRC32.update(CRC32.java:45) at org.apache.hadoop.util.DataChecksum.update(DataChecksum.java:223) at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:240) at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189) at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) - locked 0x0006fc68e9d8 (a org.apache.hadoop.hdfs.BlockReaderLocal) at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1457) - locked 0x0006fc68e9d8 (a org.apache.hadoop.hdfs.BlockReaderLocal) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:326) - locked 0x0006fc68e9d8 (a
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217732#comment-13217732 ] stack commented on HBASE-5270: -- @Ted Yes. We can keep the prefix and change the rest of the sentence to be more generic. If Chunhui reuses it here, it'll be an exception the master throws when they want the client to come back in a while. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217927#comment-13217927 ] stack commented on HBASE-5074: -- Hey Ted. Comment was not for you, it was for the patch author. bq. The exception about org.apache.hadoop.util.PureJavaCrc32C not found should be normal - it was WARN. The above makes no sense. You have WARN and 'normal' in the same sentence. If you look at the log, it says: 1. 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available. 2. 2012-02-27 23:34:20,930 INFO org.apache.hadoop.hbase.util.ChecksumType: Checksum using java.util.zip.CRC32 3. It spews a thread dump saying AGAIN that org.apache.hadoop.util.PureJavaCrc32C not available. That is going to confuse. bq. Metrics should be collected on the cluster to see the difference. Go easy on telling folks what they should do. It tends to piss them off. support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5161) Compaction algorithm should prioritize reference files
[ https://issues.apache.org/jira/browse/HBASE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217933#comment-13217933 ] stack commented on HBASE-5161: -- This is not actually a problem, right J-D? The actual problem is that it takes a long time to clear the reference files -- even though they are the first things scheduled on region open -- because sometimes we have such a backlog of compaction to catch up on (lots of big files). Compaction algorithm should prioritize reference files -- Key: HBASE-5161 URL: https://issues.apache.org/jira/browse/HBASE-5161 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.92.1, 0.94.0 I got myself into a state where my table was un-splittable as long as the insert load was coming in. Emergency flushes because of the low memory barrier don't check the number of store files so it never blocks, to a point where I had in one case 45 store files and the compactions were almost never done on the reference files (had 15 of them, went down by one in 20 minutes). Since you can't split regions with reference files, that region couldn't split and was doomed to just get more store files until the load stopped. Marking this as a minor issue, what we really need is a better pushback mechanism but not prioritizing reference files seems wrong. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218470#comment-13218470 ] stack commented on HBASE-5074: -- @Dhruba Its good trying for PureJavaCrc32 first. Get rid of the WARN w/ thread dump I'd say especially as is where it comes after reporting we're not going to use PureJavaCrc32. The feature does seem to be on by default but it would be nice to know it w/o having to go to ganglia graphs to figure my i/o loading to see whether or not this feature is enabled -- going to ganglia would be useless anyways in case where I've no history w/ an hbase read load -- so some kind of log output might be useful? Good on you D. support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5486) Warn message in HTable: Stringify the byte[]
[ https://issues.apache.org/jira/browse/HBASE-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218484#comment-13218484 ] stack commented on HBASE-5486: -- Himanshu The patch build failed because you need to use --no-prefix on the git patches you attach here. Do that the next time. Let me commit this. Warn message in HTable: Stringify the byte[] Key: HBASE-5486 URL: https://issues.apache.org/jira/browse/HBASE-5486 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Priority: Trivial Labels: noob Attachments: 5486.patch The warn message in the method getStartEndKeys() in HTable can be improved by stringifying the byte array for Regions.Qualifier Currently, a sample message is like : 12/01/17 16:36:34 WARN client.HTable: Null [B@552c8fa8 cell in keyvalues={test5,\xC9\xA2\x00\x00\x00\x00\x00\x00/00_0,1326642537734.dbc62b2765529a9ad2ddcf8eb58cb2dc./info:server/1326750341579/Put/vlen=28, test5,\xC9\xA2\x00\x00\x00\x00\x00\x00/00_0,1326642537734.dbc62b2765529a9ad2ddcf8eb58cb2dc./info:serverstartcode/1326750341579/Put/vlen=8} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5486) Warn message in HTable: Stringify the byte[]
[ https://issues.apache.org/jira/browse/HBASE-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218489#comment-13218489 ] stack commented on HBASE-5486: -- Hmm.. shouldn't this toString be a static itself in HConstants rather than make it each time? Want to have another go at it Himanshu? Thanks. Warn message in HTable: Stringify the byte[] Key: HBASE-5486 URL: https://issues.apache.org/jira/browse/HBASE-5486 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Priority: Trivial Labels: noob Attachments: 5486.patch The warn message in the method getStartEndKeys() in HTable can be improved by stringifying the byte array for Regions.Qualifier Currently, a sample message is like : 12/01/17 16:36:34 WARN client.HTable: Null [B@552c8fa8 cell in keyvalues={test5,\xC9\xA2\x00\x00\x00\x00\x00\x00/00_0,1326642537734.dbc62b2765529a9ad2ddcf8eb58cb2dc./info:server/1326750341579/Put/vlen=28, test5,\xC9\xA2\x00\x00\x00\x00\x00\x00/00_0,1326642537734.dbc62b2765529a9ad2ddcf8eb58cb2dc./info:serverstartcode/1326750341579/Put/vlen=8} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218514#comment-13218514 ] stack commented on HBASE-5399: -- On 1., yeah, the close should close the connection -- a client-side thing On 2., not so mad about it. On 3., you obtain the objective it seems but the solution does seem convoluted (more indirection in the client makes it yet more obtuse). Put up a patch I'd say. Lets have a look. SharedMaster is probably not the right name for the Interface? CloseableMaster or MasterConnection and doc that the close applies to the closing of the client connection to master only. Good on you N Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4324) Single unassigned directory is very slow when there are many unassigned nodes
[ https://issues.apache.org/jira/browse/HBASE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218529#comment-13218529 ] stack commented on HBASE-4324: -- Yeah, should still be an issue. Probably better to have it in 0.96, the singularity, since will necessitate change in layout up in zk. Single unassigned directory is very slow when there are many unassigned nodes - Key: HBASE-4324 URL: https://issues.apache.org/jira/browse/HBASE-4324 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.4 Reporter: Todd Lipcon Fix For: 0.96.0 Because we use a single znode for /unassigned, and we re-list it every time its contents change, assignment speed per region is O(number of unassigned regions) rather than O(1). Every time something changes about one unassigned region, the master has to re-list the entire contents of the directory inside of AssignmentManager.nodeChildrenChanged(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218785#comment-13218785 ] stack commented on HBASE-5487: -- I took a look at FATE over in accumulo. Its some nice generic primitives for running a suite of idempotent operations (even if operation only part completes, if its run again, it should clean up and continue). There is a notion of locking on a table (so can stop it transiting I suppose; there are read/write locks), a stack for operations (ops are pushed and popped off the stack), operations can respond done, failed, or even w/ a new set of operations to do first (This basic can be used to step through a number of tasks one after the other). All is persisted up in zk run by the master; if master dies, a new master can pick up the half-done task and finish it. Clients can watch zk to see if task is done. There ain't too much to the fate package; there is fate class itself, an admin, a 'store' interface of which there is a zk implementation. We should for sure take inspiration at least from the work already done. Here are the ops they do via fate: {code} fate.seedTransaction(opid, new TraceRepoMaster(new CreateTable(c.user, tableName, timeType, options)), autoCleanup); fate.seedTransaction(opid, new TraceRepoMaster(new RenameTable(tableId, oldTableName, newTableName)), autoCleanup); fate.seedTransaction(opid, new TraceRepoMaster(new CloneTable(c.user, srcTableId, tableName, propertiesToSet, propertiesToExclude)), autoCleanup); fate.seedTransaction(opid, new TraceRepoMaster(new DeleteTable(tableId)), autoCleanup); fate.seedTransaction(opid, new TraceRepoMaster(new ChangeTableState(tableId, TableOperation.ONLINE)), autoCleanup); fate.seedTransaction(opid, new TraceRepoMaster(new ChangeTableState(tableId, TableOperation.OFFLINE)), autoCleanup); fate.seedTransaction(opid, new TraceRepoMaster(new TableRangeOp(MergeInfo.Operation.MERGE, tableId, startRow, endRow)), autoCleanup); fate.seedTransaction(opid, new TraceRepoMaster(new TableRangeOp(MergeInfo.Operation.DELETE, tableId, startRow, endRow)), autoCleanup); fate.seedTransaction(opid, new TraceRepoMaster(new BulkImport(tableId, dir, failDir, setTime)), autoCleanup); fate.seedTransaction(opid, new TraceRepoMaster(new CompactRange(tableId, startRow, endRow)), autoCleanup);{code} {code} CompactRange is their term for merge. It takes a key range span, figures the tablets involved and runs the compact/merge. We want that and then something to do the remove or regions too? Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Labels: noob Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5488) Fixed OfflineMetaRepair bug
[ https://issues.apache.org/jira/browse/HBASE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219312#comment-13219312 ] stack commented on HBASE-5488: -- +1 Fixed OfflineMetaRepair bug Key: HBASE-5488 URL: https://issues.apache.org/jira/browse/HBASE-5488 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: gaojinchao Assignee: gaojinchao Priority: Minor Fix For: 0.90.7, 0.92.1 Attachments: HBASE-5488-branch92.patch, HBASE-5488-trunk.patch, HBASE-5488_branch90.txt I want to use OfflineMetaRepair tools and found onbody fix this bugs. I will make a patch. 12/01/05 23:23:30 ERROR util.HBaseFsck: Bailed out due to: java.lang.IllegalArgumentException: Wrong FS: hdfs:// us01-ciqps1-name01.carrieriq.com:9000/hbase/M2M-INTEGRATION-MM_TION-13 25190318714/0003d2ede27668737e192d8430dbe5d0/.regioninfo, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:352) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:368) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:126) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:398) at org.apache.hadoop.hbase.util.HBaseFsck.loadMetaEntry(HBaseFsck.java:256) at org.apache.hadoop.hbase.util.HBaseFsck.loadTableInfo(HBaseFsck.java:284) at org.apache.hadoop.hbase.util.HBaseFsck.rebuildMeta(HBaseFsck.java:402) at org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair.main(OfflineMetaRe -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5491) Delete the HBaseConfiguration.create for coprocessor.Exec class
[ https://issues.apache.org/jira/browse/HBASE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219315#comment-13219315 ] stack commented on HBASE-5491: -- +1 on patch. Will add comment that setConf is for testing only on commit. Waiting on hadoopqa before committing. Delete the HBaseConfiguration.create for coprocessor.Exec class --- Key: HBASE-5491 URL: https://issues.apache.org/jira/browse/HBASE-5491 Project: HBase Issue Type: Improvement Components: coprocessors Affects Versions: 0.92.0 Environment: all Reporter: honghua zhu Fix For: 0.92.1 Attachments: HBASE-5491.patch Exec class has a field: private Configuration conf = HBaseConfiguration.create() Client side generates an Exec instance of the class, each initiated Statistics request by ExecRPCInvoker Is so HBaseConfiguration.create for each request needs to call When the server side deserialize the Exec Called once HBaseConfiguration.create in, HBaseConfiguration.create is a time consuming operation. private Configuration conf = HBaseConfiguration.create(); This code is only useful for testing code (org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testExecDeserialization), other places with the Exec class, pass a Configuration come, so no need to conf field a default value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5491) Delete the HBaseConfiguration.create for coprocessor.Exec class
[ https://issues.apache.org/jira/browse/HBASE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219316#comment-13219316 ] stack commented on HBASE-5491: -- Although, one question Honghua, why not remove the setConf and in the test do new Exec(HBaseConfiguration.create())? Delete the HBaseConfiguration.create for coprocessor.Exec class --- Key: HBASE-5491 URL: https://issues.apache.org/jira/browse/HBASE-5491 Project: HBase Issue Type: Improvement Components: coprocessors Affects Versions: 0.92.0 Environment: all Reporter: honghua zhu Fix For: 0.92.1 Attachments: HBASE-5491.patch Exec class has a field: private Configuration conf = HBaseConfiguration.create() Client side generates an Exec instance of the class, each initiated Statistics request by ExecRPCInvoker Is so HBaseConfiguration.create for each request needs to call When the server side deserialize the Exec Called once HBaseConfiguration.create in, HBaseConfiguration.create is a time consuming operation. private Configuration conf = HBaseConfiguration.create(); This code is only useful for testing code (org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testExecDeserialization), other places with the Exec class, pass a Configuration come, so no need to conf field a default value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira