[jira] [Commented] (HBASE-5244) Add a test for a column with 1M (10M? 100M) items and see how we do with it
[ https://issues.apache.org/jira/browse/HBASE-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190542#comment-13190542 ] stack commented on HBASE-5244: -- Lets add a test that fills a row with a million columns and see how we do returning items out of it. How big do we think we should be able to go in a single row? 1M columns? 10M? 100M? Whatever we think we should be able to do, we should have a test for it. Such a test would not be part of our general test suite but instead would be done up with the new 'mvn verify' failsafe facility. Add a test for a column with 1M (10M? 100M) items and see how we do with it --- Key: HBASE-5244 URL: https://issues.apache.org/jira/browse/HBASE-5244 Project: HBase Issue Type: Task Reporter: stack -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4920) We need a mascot, a totem
[ https://issues.apache.org/jira/browse/HBASE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190777#comment-13190777 ] stack commented on HBASE-4920: -- I like circle. I like the monochormatic take. I wonder if black and white might not be better since then its more identifiably an orca. It does make me think of a hockey team for some reason. It doesn't seem to go well w/ the 'apache hbase'. Maybe inverted and on right side would help and perhaps smaller. Tell your buddy thanks Marcy for helping move this along. We need a mascot, a totem - Key: HBASE-4920 URL: https://issues.apache.org/jira/browse/HBASE-4920 Project: HBase Issue Type: Task Reporter: stack Attachments: HBase Orca Logo.jpg, Orca_479990801.jpg, Screen shot 2011-11-30 at 4.06.17 PM.png, photo (2).JPG We need a totem for our t-shirt that is yet to be printed. O'Reilly owns the Clyesdale. We need something else. We could have a fluffy little duck that quacks 'hbase!' when you squeeze it and we could order boxes of them from some off-shore sweatshop that subcontracts to a contractor who employs child labor only. Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from Salesforce showed me, that was a bit too spiritual for me to be seen quoting here, it had the Orca as the 'Guardian of the Cosmic Memory': i.e. in translation, bigdata). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5237) Addendum for HBASE-5160 and HBASE-4397
[ https://issues.apache.org/jira/browse/HBASE-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190788#comment-13190788 ] stack commented on HBASE-5237: -- +1 on commit to 0.92 branch so we are consistent with 0.90 and with TRUNK. The patch is a little odd though in that if getRegionPlan returns null, it means no servers online supposedly. And then in this case we set a flag up in TM. But TM only runs every 30minutes so the setting of this flag doesn't do much? This patch is setting all servers offline in the middle of an assign. It doesn't seem like we should be doing this here. Anyways, +1 so we are consistent with other branches. Addendum for HBASE-5160 and HBASE-4397 -- Key: HBASE-5237 URL: https://issues.apache.org/jira/browse/HBASE-5237 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.6 Attachments: HBASE-5237_0.90.patch, HBASE-5237_trunk.patch As part of HBASE-4397 there is one more scenario where the patch has to be applied. {code} RegionPlan plan = getRegionPlan(state, forceNewPlan); if (plan == null) { debugLog(state.getRegion(), Unable to determine a plan to assign + state); return; // Should get reassigned later when RIT times out. } {code} I think in this scenario also {code} this.timeoutMonitor.setAllRegionServersOffline(true); {code} this should be done. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5243) LogSyncerThread not getting shutdown waiting for the interrupted flag
[ https://issues.apache.org/jira/browse/HBASE-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190793#comment-13190793 ] stack commented on HBASE-5243: -- +1 for 0.92 branch LogSyncerThread not getting shutdown waiting for the interrupted flag - Key: HBASE-5243 URL: https://issues.apache.org/jira/browse/HBASE-5243 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.1, 0.90.6 Attachments: HBASE-5243_0.90.patch, HBASE-5243_0.90_1.patch, HBASE-5243_trunk.patch In the LogSyncer run() we keep looping till this.isInterrupted flag is set. But in some cases the DFSclient is consuming the Interrupted exception. So we are running into infinite loop in some shutdown cases. I would suggest that as we are the ones who tries to close down the LogSyncerThread we can introduce a variable like Close or shutdown and based on the state of this flag along with isInterrupted() we can make the thread stop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5245) HBase shell should use alternate jruby if JRUBY_HOME is set, should pass along JRUBY_OPTS
[ https://issues.apache.org/jira/browse/HBASE-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190795#comment-13190795 ] stack commented on HBASE-5245: -- I tried apply to TRUNK and it fails. Mind fixing Philip? And I'm interested as to why you need this? Thanks. HBase shell should use alternate jruby if JRUBY_HOME is set, should pass along JRUBY_OPTS - Key: HBASE-5245 URL: https://issues.apache.org/jira/browse/HBASE-5245 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.90.4 Reporter: Philip (flip) Kromer Priority: Minor Attachments: hbase-jruby_home-and-jruby_opts.patch Original Estimate: 0h Remaining Estimate: 0h Invoking {{hbase shell}}, the hbase runner launches the jruby jar directly, and so behaves differently than the traditional jruby runner. Specifically, it * does not respect the {{JRUBY_OPTS}} environment variable (among other things, I cannot launch the shell to use ruby-1.9 mode) * does not respect the {{JRUBY_HOME}} environment variable (placing things in an inconsistent state if my classpath holds the system jruby). This patch allows you to use an alternative jruby and to specify options to the jruby jar. * When the command is 'shell', adds {{$JRUBY_OPTS}} to the CLASS * When the command is 'shell' and {{$JRUBY_HOME}} is set, adds {{$JRUBY_HOME/lib/jruby.jar}} to the classpath, and sets {{-Djruby.home}} and {{-Djruby.job}} config variables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5245) HBase shell should use alternate jruby if JRUBY_HOME is set, should pass along JRUBY_OPTS
[ https://issues.apache.org/jira/browse/HBASE-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190797#comment-13190797 ] stack commented on HBASE-5245: -- Oh, nvm. I see you say why you need this above. HBase shell should use alternate jruby if JRUBY_HOME is set, should pass along JRUBY_OPTS - Key: HBASE-5245 URL: https://issues.apache.org/jira/browse/HBASE-5245 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.90.4 Reporter: Philip (flip) Kromer Priority: Minor Attachments: hbase-jruby_home-and-jruby_opts.patch Original Estimate: 0h Remaining Estimate: 0h Invoking {{hbase shell}}, the hbase runner launches the jruby jar directly, and so behaves differently than the traditional jruby runner. Specifically, it * does not respect the {{JRUBY_OPTS}} environment variable (among other things, I cannot launch the shell to use ruby-1.9 mode) * does not respect the {{JRUBY_HOME}} environment variable (placing things in an inconsistent state if my classpath holds the system jruby). This patch allows you to use an alternative jruby and to specify options to the jruby jar. * When the command is 'shell', adds {{$JRUBY_OPTS}} to the CLASS * When the command is 'shell' and {{$JRUBY_HOME}} is set, adds {{$JRUBY_HOME/lib/jruby.jar}} to the classpath, and sets {{-Djruby.home}} and {{-Djruby.job}} config variables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5237) Addendum for HBASE-5160 and HBASE-4397
[ https://issues.apache.org/jira/browse/HBASE-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190873#comment-13190873 ] stack commented on HBASE-5237: -- Thanks for explanation Ram. This code is hard to follow. That a call to getRegionPlan returning null means ...there are no regionservers on line and then setting a flag on the timeout monitor to indicate that seems like extreme indirection. We can fix that in another issue. Please commit this patch to 0.92. Addendum for HBASE-5160 and HBASE-4397 -- Key: HBASE-5237 URL: https://issues.apache.org/jira/browse/HBASE-5237 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.6 Attachments: HBASE-5237_0.90.patch, HBASE-5237_trunk.patch As part of HBASE-4397 there is one more scenario where the patch has to be applied. {code} RegionPlan plan = getRegionPlan(state, forceNewPlan); if (plan == null) { debugLog(state.getRegion(), Unable to determine a plan to assign + state); return; // Should get reassigned later when RIT times out. } {code} I think in this scenario also {code} this.timeoutMonitor.setAllRegionServersOffline(true); {code} this should be done. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190894#comment-13190894 ] stack commented on HBASE-5179: -- Here is some review of v17. logDirExists should be called getLogDir or getLogDirIfExists? Why is the below done up in ServerManager rather than down in ServerShutdownHandler? I'd think it'd make more sense therein? Perhaps even inside in the MetaServerShutdownHandler or whatever its called? Not a biggie. Just asking. Can we talk more about what there are: + * @param onlineServers onlined servers when master starts Are these servers that have checked in between master start and the call to processFailover. Could other servers come between the making of the list we pass into processFailover and the running of the processFailover code? Should this be a 'live' list? Or, I see that we are actually getting rid of the 'live' list of online servers to replace it w/ this static one in these lines: {code} - } else if (!serverManager.isServerOnline(regionLocation.getServerName())) { + } else if (!onlineServers.contains(regionLocation.getServerName())) { {code} Why do we do this? Could this block of be code be done out in a nice coherent method rather than inline here in HMaster: {code} +// Check zk for regionservers that are up but didn't register +for (String sn : this.regionServerTracker.getOnlineServerNames()) { + if (!this.serverManager.isServerOnline(sn)) { +HServerInfo serverInfo = HServerInfo.getServerInfo(sn); +if (serverInfo != null) { + HServerInfo existingServer = serverManager + .haveServerWithSameHostAndPortAlready(serverInfo + .getHostnamePort()); + if (existingServer == null) { +// Not registered; add it. +LOG.info(Registering server found up in zk but who has not yet ++ reported in: + sn); +// We set serverLoad with one region, it could differentiate with +// regionserver which is started just now +HServerLoad serverLoad = new HServerLoad(); +serverLoad.setNumberOfRegions(1); +serverInfo.setLoad(serverLoad); +this.serverManager.recordNewServer(serverInfo, true, null); + } +} else { + LOG.warn(Server + sn + + found up in zk, but is not a correct server name); +} + } +} {code} Can we say more about why we are doing this? And how do we know that it did not just start? Because if it has more than 0 regions, it must have been already running? {code} +if (rootServerLoad != null rootServerLoad.getNumberOfRegions() 0) { + // If rootServer is online not start just now, we expire it + this.serverManager.expireServer(rootServerInfo); +} {code} It looks like the processing of the meta server duplicates code from the processing of the root server. Can we have instead the duplicated code out in a method? Is that possible? Then pass in args for whether root or meta to process? Regards the hard coded wait of 1000ms when looking for dir to go away, see '13.4.1.2.2. Sleeps in tests' in http://hbase.apache.org/book.html#hbase.tests... it seems like we should wait less than 1000 going by the reference guide. DeadServer class looks much better. I can't write a test for a patch on 0.90 and this v17 won't work for trunk at all. The trunk patch will be very different which is a problem because then I have no confidence my trunk unit test applies to this patch that is to be applied on 0.90 branch. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch,
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190898#comment-13190898 ] stack commented on HBASE-5229: -- In the above when you talk of column prefix, couldn't that be column family? For sure we need to talk up ColumnRangeFilter and exercise it more (e.g. your observation on filters and versions needs looking into...) Good stuff. Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix. #1 is true to the current storage paradigm of HBase. #2 is true to the current client side API. I will explore these two with sample patches here. Was: As discussed (at length) on the dev mailing list with the HBASE-3584 and HBASE-5203 committed, supporting atomic cross row transactions within a region becomes simple. I am aware of the hesitation about the usefulness of this feature, but we have to start somewhere. Let's use this jira for discussion, I'll attach a patch (with tests) momentarily to make this concrete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190897#comment-13190897 ] stack commented on HBASE-5229: -- In the above when you talk of column prefix, couldn't that be column family? For sure we need to talk up ColumnRangeFilter and exercise it more (e.g. your observation on filters and versions needs looking into...) Good stuff. Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix. #1 is true to the current storage paradigm of HBase. #2 is true to the current client side API. I will explore these two with sample patches here. Was: As discussed (at length) on the dev mailing list with the HBASE-3584 and HBASE-5203 committed, supporting atomic cross row transactions within a region becomes simple. I am aware of the hesitation about the usefulness of this feature, but we have to start somewhere. Let's use this jira for discussion, I'll attach a patch (with tests) momentarily to make this concrete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191377#comment-13191377 ] stack commented on HBASE-5231: -- It looks like a method named getAssignmentsByTable will only do this if a particular configuration is set; else it will do assignments the old way. Seems like an odd name for this method. I'd have thought it would have remained getAssignments and then in getAssignments we'd switch on whether to do by table or not. Does this change default? I can't tell. Backport HBASE-3373 (per-table load balancing) to 0.92 -- Key: HBASE-5231 URL: https://issues.apache.org/jira/browse/HBASE-5231 Project: HBase Issue Type: Improvement Reporter: Zhihong Yu Fix For: 0.92.1 Attachments: 5231-v2.txt, 5231.txt This JIRA backports per-table load balancing to 0.90 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5255) Use singletons for OperationStatus to save memory
[ https://issues.apache.org/jira/browse/HBASE-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191919#comment-13191919 ] stack commented on HBASE-5255: -- +1 on patch Use singletons for OperationStatus to save memory - Key: HBASE-5255 URL: https://issues.apache.org/jira/browse/HBASE-5255 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.90.5, 0.92.0 Reporter: Benoit Sigoure Assignee: Benoit Sigoure Priority: Minor Labels: performance Fix For: 0.94.0, 0.92.1 Attachments: 5255-92.txt, 5255-v2.txt, HBASE-5255-0.92-Use-singletons-to-remove-unnecessary-memory-allocati.patch, HBASE-5255-trunk-Use-singletons-to-remove-unnecessary-memory-allocati.patch Every single {{Put}} causes the allocation of at least one {{OperationStatus}}, yet {{OperationStatus}} is almost always stateless, so these allocations are unnecessary and could be avoided. Attached patch adds a few singletons and uses them, with no public API change. I didn't test the patches, but you get the idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5189) Add metrics to keep track of region-splits in RS
[ https://issues.apache.org/jira/browse/HBASE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191920#comment-13191920 ] stack commented on HBASE-5189: -- +1 Add metrics to keep track of region-splits in RS Key: HBASE-5189 URL: https://issues.apache.org/jira/browse/HBASE-5189 Project: HBase Issue Type: Improvement Components: metrics, regionserver Affects Versions: 0.90.5, 0.92.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5189.trunk.v1.patch, HBASE-5189.trunk.v2.patch For write-heavy workload with region-size 1 GB, region-split is considerably high. We do normally grep the NN log (grep mkdir*.split NN.log | sort | uniq -c) to get the count. I would like to have a counter incremented each time region-split execution succeeds and this counter exposed via the metrics stuff in HBase. - regionSplitSuccessCount - regionSplitFailureCount (will help us to correlate the timestamp range in RS logs across all RS) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191927#comment-13191927 ] stack commented on HBASE-5229: -- I see now what the other delete issue is about. Regards CF as prefix, yeah, its a prob. now where the logical cf and physical cf are same thing. If we had locality groups, as per BT paper, we could have lots of cfs. Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix. #1 is true to the current storage paradigm of HBase. #2 is true to the current client side API. I will explore these two with sample patches here. Was: As discussed (at length) on the dev mailing list with the HBASE-3584 and HBASE-5203 committed, supporting atomic cross row transactions within a region becomes simple. I am aware of the hesitation about the usefulness of this feature, but we have to start somewhere. Let's use this jira for discussion, I'll attach a patch (with tests) momentarily to make this concrete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5262) Structured event log for HBase for monitoring and auto-tuning performance
[ https://issues.apache.org/jira/browse/HBASE-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191933#comment-13191933 ] stack commented on HBASE-5262: -- @Mikhail Sounds good. We'd write to hbase because we can't have many writers write same file in hdfs? This system table would be a new one, not .META.? We'd fill it to the end of time or the TTL on the table would get rid of old records (that'd work). Structured event log for HBase for monitoring and auto-tuning performance - Key: HBASE-5262 URL: https://issues.apache.org/jira/browse/HBASE-5262 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Creating this JIRA to open a discussion about a structured (machine-readable) log that will record events such as compaction start/end times, compaction input/output files, their sizes, the same for flushes, etc. This can be stored e.g. in a new system table in HBase itself. The data from this log can then be analyzed and used to optimize compactions at run time, or otherwise auto-tune HBase configuration to reduce the number of knobs the user has to configure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191954#comment-13191954 ] stack commented on HBASE-5179: -- Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times -- onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5230) Unit test to ensure compactions don't cache data on write
[ https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191959#comment-13191959 ] stack commented on HBASE-5230: -- lgtm Unit test to ensure compactions don't cache data on write - Key: HBASE-5230 URL: https://issues.apache.org/jira/browse/HBASE-5230 Project: HBase Issue Type: Test Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, D1353.4.patch, Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch Create a unit test for HBASE-3976 (making sure we don't cache data blocks on write during compactions even if cache-on-write is enabled generally enabled). This is because we have very different implementations of HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) and with CacheConfig (presumably it's there but not sure if it even works, since the patch in HBASE-3976 may not have been committed). We need to create a unit test to verify that we don't cache data blocks on write during compactions, and resolve HBASE-3976 so that this new unit test does not fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191971#comment-13191971 ] stack commented on HBASE-5231: -- That the methods are package private is not relevant. My comment still stands (The fact that balancer is switched via conf also is not pertinent). Backport HBASE-3373 (per-table load balancing) to 0.92 -- Key: HBASE-5231 URL: https://issues.apache.org/jira/browse/HBASE-5231 Project: HBase Issue Type: Improvement Reporter: Zhihong Yu Fix For: 0.92.1 Attachments: 5231-v2.txt, 5231.txt This JIRA backports per-table load balancing to 0.90 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5269) IllegalMonitorStateException while retryin HLog split in 0.90 branch.
[ https://issues.apache.org/jira/browse/HBASE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192248#comment-13192248 ] stack commented on HBASE-5269: -- +1 IllegalMonitorStateException while retryin HLog split in 0.90 branch. - Key: HBASE-5269 URL: https://issues.apache.org/jira/browse/HBASE-5269 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-5269.patch As part of HBASE-5137 fix this bug is introduced. The splitLogLock is released in the finally block inside the do-while loop. So when the loop executes second time the unlock of the splitLogLock throws Illegal Monitor Exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192251#comment-13192251 ] stack commented on HBASE-5231: -- It does not necessarily by table; could be by server. Whats this: {code} +result.put(ensemble, getAssignments()); {code} Whats 'ensemble'? Backport HBASE-3373 (per-table load balancing) to 0.92 -- Key: HBASE-5231 URL: https://issues.apache.org/jira/browse/HBASE-5231 Project: HBase Issue Type: Improvement Reporter: Zhihong Yu Fix For: 0.92.1 Attachments: 5231-v2.txt, 5231.txt This JIRA backports per-table load balancing to 0.90 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5262) Structured event log for HBase for monitoring and auto-tuning performance
[ https://issues.apache.org/jira/browse/HBASE-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192253#comment-13192253 ] stack commented on HBASE-5262: -- @Mikhail Sounds good to me Structured event log for HBase for monitoring and auto-tuning performance - Key: HBASE-5262 URL: https://issues.apache.org/jira/browse/HBASE-5262 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Creating this JIRA to open a discussion about a structured (machine-readable) log that will record events such as compaction start/end times, compaction input/output files, their sizes, the same for flushes, etc. This can be stored e.g. in a new system table in HBase itself. The data from this log can then be analyzed and used to optimize compactions at run time, or otherwise auto-tune HBase configuration to reduce the number of knobs the user has to configure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default
[ https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192257#comment-13192257 ] stack commented on HBASE-5267: -- lgtm Add a configuration to disable the slab cache by default Key: HBASE-5267 URL: https://issues.apache.org/jira/browse/HBASE-5267 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Li Pi Priority: Blocker Fix For: 0.94.0, 0.92.1 Attachments: 5267.txt From what I commented at the tail of HBASE-4027: {quote} I changed the release note, the patch doesn't have a hbase.offheapcachesize configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize (which is actually a big problem when you consider this: http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). {quote} We need to add hbase.offheapcachesize and set it to false by default. Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5204) Backward compatibility fixes for 0.92
[ https://issues.apache.org/jira/browse/HBASE-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192267#comment-13192267 ] stack commented on HBASE-5204: -- Yes, agree, adding patch to trunk would be silly. Lets drop it. The codes should never have been changed. Hopefully your warning... {code} 1231985 stack 1231985 stack // WARNING: Please do not insert, remove or swap any line in this static // 1231985 stack // block. Doing so would change or shift all the codes used to serialize // 1231985 stack // objects, which makes backwards compatibility very hard for clients.// 1231985 stack // New codes should always be added at the end. Code removal is // 1231985 stack // discouraged because code is a short now. // 1231985 stack {code} will help w/ that. Can we check in an asynchbase unit test that exercises all apis you need so we fail fast in case we mess up again (HBase has 16 committers now and hard to have them all on message) Backward compatibility fixes for 0.92 - Key: HBASE-5204 URL: https://issues.apache.org/jira/browse/HBASE-5204 Project: HBase Issue Type: Bug Components: ipc Reporter: Benoit Sigoure Assignee: Benoit Sigoure Priority: Blocker Labels: backwards-compatibility Fix For: 0.92.0 Attachments: 0001-Add-some-backward-compatible-support-for-reading-old.patch, 0002-Make-sure-that-a-connection-always-uses-a-protocol.patch, 0003-Change-the-code-used-when-serializing-HTableDescript.patch, 5204-92.txt, 5204-trunk.txt, 5204.addendum Attached are 3 patches that are necessary to allow compatibility between HBase 0.90.x (and previous releases) and HBase 0.92.0. First of all, I'm well aware that 0.92.0 RC4 has been thumbed up by a lot of people and would probably wind up being released as 0.92.0 tomorrow, so I sincerely apologize for creating this issue so late in the process. I spent a lot of time trying to work around the quirks of 0.92 but once I realized that with a few very quasi-trivial changes compatibility would be made significantly easier, I immediately sent these 3 patches to Stack, who suggested I create this issue. The first patch is required as without it clients sending a 0.90-style RPC to a 0.92-style server causes the server to die uncleanly. It seems that 0.92 ships with {{\-XX:OnOutOfMemoryError=kill \-9 %p}}, and when a 0.92 server fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer because it doesn't read fields of 0.90-style RPCs properly. This allocation attempt immediately triggers an OOME, which causes the JVM to die abruptly of a {{SIGKILL}}. So whenever a 0.90.x client attempts to connect to HBase, it kills whichever RS is hosting the {{\-ROOT-}} region. The second patch fixes a bug introduced by HBASE-2002, which added support for letting clients specify what protocol they want to speak. If a client doesn't properly specify what protocol to use, the connection's {{protocol}} field will be left {{null}}, which causes any subsequent RPC on that connection to trigger an NPE in the server, even though the connection was successfully established from the client's point of view. The fix is to simply give the connection a default protocol, by assuming the client meant to speak to a RegionServer. The third patch fixes an oversight that slipped in HBASE-451, where a change to {{HbaseObjectWritable}} caused all the codes used to serialize {{Writables}} to shift by one. This was carefully avoided in other changes such as HBASE-1502, which cleanly removed entries for {{HMsg}} and {{HMsg[]}}, so I don't think this breakage in HBASE-451 was intended. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192372#comment-13192372 ] stack commented on HBASE-5231: -- Its unexplained in the code, its just introduced and its very odd. Wrong even. Backport HBASE-3373 (per-table load balancing) to 0.92 -- Key: HBASE-5231 URL: https://issues.apache.org/jira/browse/HBASE-5231 Project: HBase Issue Type: Improvement Reporter: Zhihong Yu Fix For: 0.92.1 Attachments: 5231-v2.txt, 5231.txt This JIRA backports per-table load balancing to 0.90 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192440#comment-13192440 ] stack commented on HBASE-5231: -- In a method named getAssignmentsByTable we will return regions by table IFF a configuration hbase.master.loadbalance.bytable is true. Otherwise we return all regions belonging to a 'table' named 'ensemble'. No where is 'ensemble' explained. How is a noob who follows along after us trying to make sense of this supposed to figure whats going on? A method named getAssignmentsByTable should return assignments by table... not assignments by table and then if some flag in config. is set, assignments by some arbitrary pseudo table. At a minimum it needs to be explained by comments and in javadoc. But really it says to me that this new feature is not well thought through. Why do we worry about regions by table outside of the balancer invocation; shouldn't the balancer-by-table being asking about a regions table down in the balancer guts rather than up here high in the master. Looking more at what is going on, when the balance finishes, do we have a balanced cluster? There is no test to prove it and thinking on it, given as we invoke the balancer per table, if lots of tables with different region count skew, I'd think it could throw off the basic cluster balance (regions per regionserver). Backport HBASE-3373 (per-table load balancing) to 0.92 -- Key: HBASE-5231 URL: https://issues.apache.org/jira/browse/HBASE-5231 Project: HBase Issue Type: Improvement Reporter: Zhihong Yu Fix For: 0.92.1 Attachments: 5231-v2.txt, 5231.txt This JIRA backports per-table load balancing to 0.90 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5262) Structured event log for HBase for monitoring and auto-tuning performance
[ https://issues.apache.org/jira/browse/HBASE-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192603#comment-13192603 ] stack commented on HBASE-5262: -- @Mikhail We used have an extra column family in .META. called history. Into it we'd write 'events'. It was stripped because though a nice idea, it caused more trouble than it was worth (x-cluster deadlocking, interfering w/ shutdowns, growing at a different rate to the info family in .META.)...so yeah, should fail fast if can't put events, etc. Structured event log for HBase for monitoring and auto-tuning performance - Key: HBASE-5262 URL: https://issues.apache.org/jira/browse/HBASE-5262 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Creating this JIRA to open a discussion about a structured (machine-readable) log that will record events such as compaction start/end times, compaction input/output files, their sizes, the same for flushes, etc. This can be stored e.g. in a new system table in HBase itself. The data from this log can then be analyzed and used to optimize compactions at run time, or otherwise auto-tune HBase configuration to reduce the number of knobs the user has to configure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192612#comment-13192612 ] stack commented on HBASE-5231: -- bq. The relationship between this feature and balancer sloppiness is definitely within the scope of discussion. I don't understand how. bq. Since the existing tests for load balancer pass, there is no regression introduced by this feature. The existing test uses table balancing? (There is no test for table balancer that I can see) -1 on this patch going into 0.92. Its incoherent. Backport HBASE-3373 (per-table load balancing) to 0.92 -- Key: HBASE-5231 URL: https://issues.apache.org/jira/browse/HBASE-5231 Project: HBase Issue Type: Improvement Reporter: Zhihong Yu Fix For: 0.92.1 Attachments: 5231-v2.txt, 5231.addendum, 5231.txt This JIRA backports per-table load balancing to 0.90 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5249) Using a stale explicit row lock in a Put triggers an NPE
[ https://issues.apache.org/jira/browse/HBASE-5249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192635#comment-13192635 ] stack commented on HBASE-5249: -- Copying over Ted comment that was in hbase-5171 (resolved as a duplicate): {code} Zhihong Yu added a comment - 10/Jan/12 18:03 One workaround is to increase the value for hbase.rowlock.wait.duration But a new exception should be introduced anyways. {code} Yves says up on the list: {code} After checking the source code I've noticed that the value which is going to be put into the HashMap can be null in the case where the waitForLock flag is true or the rowLockWaitDuration is expired (HRegion#internalObtainRowLock, line 2111ff). The latter I think happens in our case as we have heavy load hitting the server. {code} Using a stale explicit row lock in a Put triggers an NPE Key: HBASE-5249 URL: https://issues.apache.org/jira/browse/HBASE-5249 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: stack Assignee: stack Priority: Minor Fix For: 0.92.1 After acquiring an explicit row lock, if one attempts to send {{Put}} after the row lock has expired, an NPE is triggered in the RegionServer, instead of throwing an {{UnknownRowLockException}} back to the client. {code} 2012-01-20 17:09:54,074 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true) java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2313) at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2299) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1327) It happened only once out of thousands of RPCs that grabbed and released a row lock. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193187#comment-13193187 ] stack commented on HBASE-5270: -- bq. So, the param 'definitiveRootServer' is used in this case to ensure the dead root server is carryingRoot when it is being expired. Whats 'definitive' about it? Is it that we know for sure the server was carrying root or meta? How? bq. Is there any possible to expire a server if its carrying root and meta now? I don't think so. You are saying that this patch does nothing new here? We COULD expire the server that was carrying root, wait on its log split, then expire the server carrying meta (though it may have been the same server)... it might be ok but we might kill a server that has just started. I'm ok if fixing this is outside scope of this patch. bq. I don't find this operation earlier in master setup, and this operation is not introduced by this issue. And I only introduce this logic for 90 from trunk. So, you copied this to 0.90 from TRUNK (so my notion that we already had this is my remembering how things work on TRUNK.. that would make sense). bq. I think we need explain it, But whether we shouldn't use distributed split log, I'm not very sure. If we are not sure, we shouldn't do it. bq. When matser is initializing, if one RS is killed and restart, then dead server is in progress while master startup This seems like a small window. Or do you think it could happen frequent? Could we hold up shutdownserverhandler until master is up? Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193209#comment-13193209 ] stack commented on HBASE-5270: -- bq. What if the region server hosting .META. went down ? Yes... was just thinking about that. In this case we'd run the splitter in-line, in SSH, not via executor let me look at code. I'm trying to write tests and catch-up on all the stuff that was done over on previous issue. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5276) PerformanceEvaluation does not set the correct classpath for MR because it lives in the test jar
[ https://issues.apache.org/jira/browse/HBASE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193255#comment-13193255 ] stack commented on HBASE-5276: -- @Tim Maybe open issue against CDH and close this one? PerformanceEvaluation does not set the correct classpath for MR because it lives in the test jar Key: HBASE-5276 URL: https://issues.apache.org/jira/browse/HBASE-5276 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.90.4 Reporter: Tim Robertson Priority: Minor Note: This was discovered running the CDH version hbase-0.90.4-cdh3u2 Running the PerformanceEvaluation as follows: $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation scan 5 fails because the MR tasks do not get the HBase jar on the CP, and thus hit ClassNotFoundExceptions. The job gets the following only: file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/hbase-0.90.4-cdh3u2-tests.jar file:/Users/tim/dev/hadoop/hadoop-0.20.2-cdh3u2/hadoop-core-0.20.2-cdh3u2.jar file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/zookeeper-3.3.3-cdh3u2.jar The RowCounter etc all work because they live in the HBase jar, not the test jar, and they get the following file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/guava-r06.jar file:/Users/tim/dev/hadoop/hadoop-0.20.2-cdh3u2/hadoop-core-0.20.2-cdh3u2.jar file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/hbase-0.90.4-cdh3u2.jar file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/zookeeper-3.3.3-cdh3u2.jar Presumably this relates to job.setJarByClass(PerformanceEvaluation.class); ... TableMapReduceUtil.addDependencyJars(job); A (cowboy) workaround to run PE is to unpack the jars, and copy the PerformanceEvaluation* classes building a patched jar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5278) HBase shell script refers to removed migrate functionality
[ https://issues.apache.org/jira/browse/HBASE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193365#comment-13193365 ] stack commented on HBASE-5278: -- @Jon I've a bit of practise HBase shell script refers to removed migrate functionality Key: HBASE-5278 URL: https://issues.apache.org/jira/browse/HBASE-5278 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.94.0, 0.90.5, 0.92.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Trivial Fix For: 0.94.0, 0.92.1 Attachments: hbase-5278.patch $ hbase migrate Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Migrate Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Migrate at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: org.apache.hadoop.hbase.util.Migrate. Program will exit. The 'hbase' shell script has docs referring to a 'migrate' command which no longer exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5279) NPE in Master after upgrading to 0.92.0
[ https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193363#comment-13193363 ] stack commented on HBASE-5279: -- Skipping should be fine. You have a scan of .META. from before upgrade? Are you up now? NPE in Master after upgrading to 0.92.0 --- Key: HBASE-5279 URL: https://issues.apache.org/jira/browse/HBASE-5279 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Tobias Herbert Priority: Critical Attachments: HBASE-5279.patch I have upgraded my environment from 0.90.4 to 0.92.0 after the table migration I get the following error in the master (permanent) {noformat} 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Aborting {noformat} I think that's because I had a hard crash in the cluster a while ago - and the following WARN since then {noformat} 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8} {noformat} my patch was simple to go around the NPE (as the other code around the lines) but I don't know if that's correct -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195136#comment-13195136 ] stack commented on HBASE-4991: -- bq. For successive regions R1, R2 and R3, if we delete R2, we can change the end key of R1 to be the original end key of R2 and drop region R2 directly. Changing the end key of R1 to be the end key of R2 will require creating a new region R1'; we'll have to close R1 and R2 and delete both (after moving the R1 data to R1'). Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler
[ https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195163#comment-13195163 ] stack commented on HBASE-5120: -- So, setting down the TM period from 30 minutes to 2 minutes and doing a bunch of online enable/disable works? If so, I'm +1 on this patch (We'd actually set the timeout down from 30minutes to 5minutes over in hbase-5119?) Timeout monitor races with table disable handler Key: HBASE-5120 URL: https://issues.apache.org/jira/browse/HBASE-5120 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Zhihong Yu Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch, HBASE-5120_5.patch Here is what J-D described here: https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176 I think I will retract from my statement that it used to be extremely racy and caused more troubles than it fixed, on my first test I got a stuck region in transition instead of being able to recover. The timeout was set to 2 minutes to be sure I hit it. First the region gets closed {quote} 2012-01-04 00:16:25,811 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to sv4r5s38,62023,1325635980913 for region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. {quote} 2 minutes later it times out: {quote} 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636185810, server=null 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,027 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. (offlining) {quote} 100ms later the master finally gets the event: {quote} 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 1a4b111bcc228043e89f59c4c3f6a791 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Deleting existing unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Successfully deleted unassigned node for region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED {quote} At this point everything is fine, the region was processed as closed. But wait, remember that line where it said it was going to force an unassign? {quote} 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Creating unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state 2012-01-04 00:18:30,328 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server null returned java.lang.NullPointerException: Passed server is null for 1a4b111bcc228043e89f59c4c3f6a791 {quote} Now the master is confused, it recreated the RIT znode but the region doesn't even exist anymore. It even tries to shut it down but is blocked by NPEs. Now this is what's going on. The late ZK notification that the znode was deleted (but it got recreated after): {quote} 2012-01-04 00:19:33,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been deleted. {quote} Then it prints this, and much later tries to unassign it again: {quote} 2012-01-04 00:19:46,607 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on region to clear regions in transition; test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195187#comment-13195187 ] stack commented on HBASE-5229: -- @Lars I'm 'intellectually' interested. I have no practical need. I'm more interested in our being able to support large rows (intra-row scanning, etc.); i.e. one row in a region and that region is huge and it works. Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix. #1 is true to the current storage paradigm of HBase. #2 is true to the current client side API. I will explore these two with sample patches here. Was: As discussed (at length) on the dev mailing list with the HBASE-3584 and HBASE-5203 committed, supporting atomic cross row transactions within a region becomes simple. I am aware of the hesitation about the usefulness of this feature, but we have to start somewhere. Let's use this jira for discussion, I'll attach a patch (with tests) momentarily to make this concrete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3134) [replication] Add the ability to enable/disable streams
[ https://issues.apache.org/jira/browse/HBASE-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195189#comment-13195189 ] stack commented on HBASE-3134: -- bq. Should I do this in this ticket or file another issue? Up to you. [replication] Add the ability to enable/disable streams --- Key: HBASE-3134 URL: https://issues.apache.org/jira/browse/HBASE-3134 Project: HBase Issue Type: New Feature Components: replication Reporter: Jean-Daniel Cryans Assignee: Teruyoshi Zenmyo Priority: Minor Labels: replication Fix For: 0.94.0 Attachments: HBASE-3134.patch This jira was initially in the scope of HBASE-2201, but was pushed out since it has low value compared to the required effort (and when want to ship 0.90.0 rather soonish). We need to design a way to enable/disable replication streams in a determinate fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4810) Clean up IPC logging configuration
[ https://issues.apache.org/jira/browse/HBASE-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197029#comment-13197029 ] stack commented on HBASE-4810: -- +1 Clean up IPC logging configuration -- Key: HBASE-4810 URL: https://issues.apache.org/jira/browse/HBASE-4810 Project: HBase Issue Type: Improvement Components: ipc Reporter: Gary Helmling The current IPC classes -- HBaseClient, HBaseServer, HBaseRPC, WritableRpcEngine, etc. -- use mangled package names (org.apache.hadoop.ipc) when obtaining loggers so that we can enable debug logging on the full org.apache.hadoop.hbase base package and not have the noise from per-request log messages drown out other important information. This has the desired effect, but is extremely hacky and counter-intuitive. I think it would be better to fix the package name used by these classes (use org.apache.hadoop.hbase.ipc), and change the noisy, per-request messages to be logged at trace level. This will keep those messages out of standard debug output, while allowing the log4j configuration when you actually wish to see them to be a little more intuitive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5285) runtime exception -- cached an already cached block -- during compaction
[ https://issues.apache.org/jira/browse/HBASE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197311#comment-13197311 ] stack commented on HBASE-5285: -- Can you make this happen reliably Simon? runtime exception -- cached an already cached block -- during compaction Key: HBASE-5285 URL: https://issues.apache.org/jira/browse/HBASE-5285 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Environment: hadoop-1.0 and hbase-0.92 18 node cluster, dedicated namenode, zookeeper, hbasemaster, and YCSB client machine. latest YCSB Reporter: Simon Dircks Priority: Trivial #On YCSB client machine: /usr/local/bin/java -cp build/ycsb.jar:db/hbase/lib/*:db/hbase/conf/ com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family1 -p recordcount=500 -s load.dat loaded 5mil records, that created 8 regions. (balanced all onto the same RS) /usr/local/bin/java -cp build/ycsb.jar:db/hbase/lib/*:db/hbase/conf/ com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family1 -p operationcount=500 -threads 10 -s transaction.dat #On RS that was holding the 8 regions above. 2012-01-25 23:23:51,556 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x134f70a343101a0 Successfully transitioned node 162702503c650e551130e5fb588b3ec2 from RS_ZK_REGION_SPLIT to RS_ZK_REGION_SPLIT 2012-01-25 23:23:51,616 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:268) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:276) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:487) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:168) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:181) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:111) at org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:83) at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1721) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:2861) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1432) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1424) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1400) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:3688) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:3581) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1771) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1325) 2012-01-25 23:23:51,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x134f70a343101a0 Attempting to transition node 162702503c650e551130e5fb588b3ec2 from RS_ZK_REGION_SPLIT to RS_ZK_REGION_SPLIT -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5286) bin/hbase's logic of adding Hadoop jar files to the classpath is fragile when presented with split packaged Hadoop 0.23 installation
[ https://issues.apache.org/jira/browse/HBASE-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197316#comment-13197316 ] stack commented on HBASE-5286: -- Thanks for filing the issue Roman. bin/hbase's logic of adding Hadoop jar files to the classpath is fragile when presented with split packaged Hadoop 0.23 installation Key: HBASE-5286 URL: https://issues.apache.org/jira/browse/HBASE-5286 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Here's the bit from bin/hbase that might need TLC now that Hadoop can be spotted in the wild in split-package configuration: {noformat} #If avail, add Hadoop to the CLASSPATH and to the JAVA_LIBRARY_PATH if [ ! -z $HADOOP_HOME ]; then HADOOPCPPATH= if [ -z $HADOOP_CONF_DIR ]; then HADOOPCPPATH=$(append_path ${HADOOPCPPATH} ${HADOOP_HOME}/conf) else HADOOPCPPATH=$(append_path ${HADOOPCPPATH} ${HADOOP_CONF_DIR}) fi if [ `echo ${HADOOP_HOME}/hadoop-core*.jar` != ${HADOOP_HOME}/hadoop-core*.jar ] ; then HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-core*.jar | head -1`) else HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-common*.jar | head -1`) HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-hdfs*.jar | head -1`) HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-mapred*.jar | head -1`) fi {noformat} There's a couple of issues with the above code: 0. HADOOP_HOME is now deprecated in Hadoop 0.23 1. the list of jar files added to the class-path should be revised 2. we need to figure out a more robust way to get the jar files that are needed to the classpath (things like hadoop-mapred*.jar tend to match src/test jars as well) Better yet, it would be useful to look into whether we can transition HBase's bin/hbase onto using bin/hadoop as a launcher script instead of direct JAVA invocations (Pig, Hive, Sqoop and Mahout already do that) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197328#comment-13197328 ] stack commented on HBASE-5229: -- bq. I hear you. Means that the client and data modeling would have to change completely if it needs transactions. Could this be done as a facade atop our current api? Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix. #1 is true to the current storage paradigm of HBase. #2 is true to the current client side API. I will explore these two with sample patches here. Was: As discussed (at length) on the dev mailing list with the HBASE-3584 and HBASE-5203 committed, supporting atomic cross row transactions within a region becomes simple. I am aware of the hesitation about the usefulness of this feature, but we have to start somewhere. Let's use this jira for discussion, I'll attach a patch (with tests) momentarily to make this concrete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker
[ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197340#comment-13197340 ] stack commented on HBASE-5268: -- I added a footnote to book in delete section pointing at your blog Lars. Add delete column prefix delete marker -- Key: HBASE-5268 URL: https://issues.apache.org/jira/browse/HBASE-5268 Project: HBase Issue Type: Improvement Components: client, regionserver Reporter: Lars Hofhansl Attachments: 5268-proof.txt, 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt This is another part missing in the wide row challenge. Currently entire families of a row can be deleted or individual columns or versions. There is no facility to mark multiple columns for deletion by column prefix. Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5266) Add documentation for ColumnRangeFilter
[ https://issues.apache.org/jira/browse/HBASE-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197347#comment-13197347 ] stack commented on HBASE-5266: -- Nothing wrong w/ your english. The below needs a bit more english, namely a 'be'... can used Can be fixed on commit. Otherwise doc is excellent. +1 Add documentation for ColumnRangeFilter --- Key: HBASE-5266 URL: https://issues.apache.org/jira/browse/HBASE-5266 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Attachments: 5266-v2.txt, 5266-v3.txt, 5266.txt There are only a few lines of documentation for ColumnRangeFilter. Given the usefulness of this filter for efficient intra-row scanning (see HBASE-5229 and HBASE-4256), we should make this filter more prominent in the documentation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5281) Should a failure in creating an unassigned node abort the master?
[ https://issues.apache.org/jira/browse/HBASE-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197359#comment-13197359 ] stack commented on HBASE-5281: -- I'm not sure removing the abort only is enough Harsh. If we can't write zk, we can't assign a region so holes in the table. Thats radical. We should be consistent regards our policy when a zk write fails. We are not currently as you've found here. Should a failure in creating an unassigned node abort the master? - Key: HBASE-5281 URL: https://issues.apache.org/jira/browse/HBASE-5281 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5 Reporter: Harsh J Assignee: Harsh J Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5281.patch In {{AssignmentManager}}'s {{CreateUnassignedAsyncCallback}}, we have the following condition: {code} if (rc != 0) { // Thisis resultcode. If non-zero, need to resubmit. LOG.warn(rc != 0 for + path + -- retryable connectionloss -- + FIX see http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2;); this.zkw.abort(Connectionloss writing unassigned at + path + , rc= + rc, null); return; } {code} While a similar structure inside {{ExistsUnassignedAsyncCallback}} (which the above is linked to), does not have such a force abort. Do we really require the abort statement here, or can we make do without? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5299) CatalogTracker.getMetaServerConnection() checks for root server connection and makes waitForMeta to go into infinite loop in region assignment flow.
[ https://issues.apache.org/jira/browse/HBASE-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197362#comment-13197362 ] stack commented on HBASE-5299: -- I could take this one Ram. I'm trying to hack up testing harness for Master. Making a test for this condition would be a good exercise. CatalogTracker.getMetaServerConnection() checks for root server connection and makes waitForMeta to go into infinite loop in region assignment flow. Key: HBASE-5299 URL: https://issues.apache.org/jira/browse/HBASE-5299 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor RSA, RS B and RS C are 3 region servers. RS A - META RS B - ROOT RS C - NON META and NON ROOT Kill RS B and wait for server shutdown handler to start. Start RS B again before assigning ROOT to RS C. Now the cluster will try to assign new regions to RS B. But as ROOT is not yet assigned the OpenRegionHandler.updateMeta will fail to update the regions just because ROOT is not online. {code} a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:23:25,126 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Attempting to transition node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:23:25,159 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Successfully transitioned node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:23:35,385 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Attempting to transition node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:23:35,449 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Successfully transitioned node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:24:16,666 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Attempting to transition node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:24:16,701 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Successfully transitioned node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:24:20,788 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Interrupting thread Thread[PostOpenDeployTasks:a87109263ed53e67158377a149c5a7be,5,main] 2012-01-30 16:24:30,699 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Exception running postOpenDeployTasks; region=a87109263ed53e67158377a149c5a7be org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Interrupted at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:439) at org.apache.hadoop.hbase.catalog.MetaEditor.updateRegionLocation(MetaEditor.java:142) at org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:1382) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:221) {code} So we need to wait for TM to assign the regions again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5280) Remove AssignmentManager#clearRegionFromTransition and replace with assignmentManager#regionOffline
[ https://issues.apache.org/jira/browse/HBASE-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197373#comment-13197373 ] stack commented on HBASE-5280: -- +1 Remove AssignmentManager#clearRegionFromTransition and replace with assignmentManager#regionOffline --- Key: HBASE-5280 URL: https://issues.apache.org/jira/browse/HBASE-5280 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.94.0, 0.90.5, 0.92.0 Reporter: Jonathan Hsieh These two methods are essentially the same and both present in the code base. It was suggested in the review for HBASE-5128 to remove #clearRegionFromTransition in favor of #regionOffline (HBASE-5128 deprecates this method, but it is internal to the HMaster, so should be safely removable from 0.92 and 0.94). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5309) Hbase web-app /jmx throws an exception
[ https://issues.apache.org/jira/browse/HBASE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197393#comment-13197393 ] stack commented on HBASE-5309: -- If you go to /jmx, are we supposed to dump a view on jmx? Hbase web-app /jmx throws an exception -- Key: HBASE-5309 URL: https://issues.apache.org/jira/browse/HBASE-5309 Project: HBase Issue Type: Bug Reporter: Hitesh Shah hbasemaster:60010/jmx throws an NoSuchMethodError exception -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3850) Log more details when a scanner lease expires
[ https://issues.apache.org/jira/browse/HBASE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197392#comment-13197392 ] stack commented on HBASE-3850: -- +1 on patch (fix spacing on the 'else' on commit) Won't regionname include name of table? Log more details when a scanner lease expires - Key: HBASE-3850 URL: https://issues.apache.org/jira/browse/HBASE-3850 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Benoit Sigoure Assignee: Darren Haas Priority: Critical Fix For: 0.94.0 Attachments: HBASE-3850.trunk.v1.patch The message logged by the RegionServer when a Scanner lease expires isn't as useful as it could be. {{Scanner 4765412385779771089 lease expired}} - most clients don't log their scanner ID, so it's really hard to figure out what was going on. I think it would be useful to at least log the name of the region on which the Scanner was open, and it would be great to have the ip:port of the client that had that lease too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3859) Increment a counter when a Scanner lease expires
[ https://issues.apache.org/jira/browse/HBASE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197395#comment-13197395 ] stack commented on HBASE-3859: -- Patch looks good. Any evidence it works Mubarak? Increment a counter when a Scanner lease expires Key: HBASE-3859 URL: https://issues.apache.org/jira/browse/HBASE-3859 Project: HBase Issue Type: Improvement Components: metrics, regionserver Affects Versions: 0.90.2 Reporter: Benoit Sigoure Assignee: Mubarak Seyed Priority: Minor Attachments: HBASE-3859.trunk.v1.patch Whenever a Scanner lease expires, the RegionServer will close it automatically and log a message to complain. I would like the RegionServer to increment a counter whenever this happens and expose this counter through the metrics system, so we can plug this into our monitoring system (OpenTSDB) and keep track of how frequently this happens. It's not supposed to happen frequently so it's good to keep an eye on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5283) Request counters may become negative for heavily loaded regions
[ https://issues.apache.org/jira/browse/HBASE-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197398#comment-13197398 ] stack commented on HBASE-5283: -- +1 on patch. Request counters may become negative for heavily loaded regions --- Key: HBASE-5283 URL: https://issues.apache.org/jira/browse/HBASE-5283 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Zhihong Yu Assignee: Mubarak Seyed Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5283.trunk.v1.patch Requests counter showing negative count, example under 'Requests' column: -645470239 {code} Name Region Server Start Key End Key Requests usertable,user2037516127892189021,1326756873774.16833e4566d1daef109b8fdcd1f4b5a6. xxx.com:60030 user2037516127892189021 user2296868939942738705 -645470239 {code} RegionLoad.readRequestsCount and RegionLoad.writeRequestsCount are of int type. Our Ops has been running lots of heavy load operation. RegionLoad.getRequestsCount() overflows int.MAX_VALUE. It is set to D986E7E1. In table.jsp, RegionLoad.getRequestsCount() is assigned to long type. D986E7E1 is converted to long D986E7E1 which is -645470239 in decimal. Suggested fix is to make readRequestsCount and writeRequestsCount long type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5283) Request counters may become negative for heavily loaded regions
[ https://issues.apache.org/jira/browse/HBASE-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197400#comment-13197400 ] stack commented on HBASE-5283: -- +1 on patch. Request counters may become negative for heavily loaded regions --- Key: HBASE-5283 URL: https://issues.apache.org/jira/browse/HBASE-5283 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Zhihong Yu Assignee: Mubarak Seyed Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5283.trunk.v1.patch Requests counter showing negative count, example under 'Requests' column: -645470239 {code} Name Region Server Start Key End Key Requests usertable,user2037516127892189021,1326756873774.16833e4566d1daef109b8fdcd1f4b5a6. xxx.com:60030 user2037516127892189021 user2296868939942738705 -645470239 {code} RegionLoad.readRequestsCount and RegionLoad.writeRequestsCount are of int type. Our Ops has been running lots of heavy load operation. RegionLoad.getRequestsCount() overflows int.MAX_VALUE. It is set to D986E7E1. In table.jsp, RegionLoad.getRequestsCount() is assigned to long type. D986E7E1 is converted to long D986E7E1 which is -645470239 in decimal. Suggested fix is to make readRequestsCount and writeRequestsCount long type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5281) Should a failure in creating an unassigned node abort the master?
[ https://issues.apache.org/jira/browse/HBASE-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197445#comment-13197445 ] stack commented on HBASE-5281: -- At least in 0.92.0 (and 0.89fb) we are using recoverablezk. It will retry 'recoverable' zk errors. Are you suggesting something else Jimmy? That we wrap all zk ops in a higher-level retry loop? Should a failure in creating an unassigned node abort the master? - Key: HBASE-5281 URL: https://issues.apache.org/jira/browse/HBASE-5281 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5 Reporter: Harsh J Assignee: Harsh J Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5281.patch In {{AssignmentManager}}'s {{CreateUnassignedAsyncCallback}}, we have the following condition: {code} if (rc != 0) { // Thisis resultcode. If non-zero, need to resubmit. LOG.warn(rc != 0 for + path + -- retryable connectionloss -- + FIX see http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2;); this.zkw.abort(Connectionloss writing unassigned at + path + , rc= + rc, null); return; } {code} While a similar structure inside {{ExistsUnassignedAsyncCallback}} (which the above is linked to), does not have such a force abort. Do we really require the abort statement here, or can we make do without? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5256) Use WritableUtils.readVInt() in RegionLoad.readFields()
[ https://issues.apache.org/jira/browse/HBASE-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197475#comment-13197475 ] stack commented on HBASE-5256: -- Will we have to be careful with this patch? It can't go into a minor version, right else you could have a regionserver reporting a master metrics in a format it can't interpret (or vice versa, the master will be expecting them as longs but they come over as vlongs). We up the version on HServerLoad but don't exploit the version change. We could have had HSL self-migrate reading using old code if it got a v1 HSL to deserialize. Then we could have this in a point version. Use WritableUtils.readVInt() in RegionLoad.readFields() --- Key: HBASE-5256 URL: https://issues.apache.org/jira/browse/HBASE-5256 Project: HBase Issue Type: Task Reporter: Zhihong Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-5256.trunk.v1.patch Currently in.readInt() is used in RegionLoad.readFields() More metrics would be added to RegionLoad in the future, we should utilize WritableUtils.readVInt() to reduce the amount of data exchanged between Master and region servers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5256) Use WritableUtils.readVInt() in RegionLoad.readFields()
[ https://issues.apache.org/jira/browse/HBASE-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197586#comment-13197586 ] stack commented on HBASE-5256: -- Yes. Upgrading. Seems like it'd be easy to make this self-migrating. Would suggest we do it. Make a new issue? Use WritableUtils.readVInt() in RegionLoad.readFields() --- Key: HBASE-5256 URL: https://issues.apache.org/jira/browse/HBASE-5256 Project: HBase Issue Type: Task Reporter: Zhihong Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-5256.trunk.v1.patch Currently in.readInt() is used in RegionLoad.readFields() More metrics would be added to RegionLoad in the future, we should utilize WritableUtils.readVInt() to reduce the amount of data exchanged between Master and region servers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5309) Hbase web-app /jmx throws an exception
[ https://issues.apache.org/jira/browse/HBASE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197587#comment-13197587 ] stack commented on HBASE-5309: -- That'd work for us. Hbase web-app /jmx throws an exception -- Key: HBASE-5309 URL: https://issues.apache.org/jira/browse/HBASE-5309 Project: HBase Issue Type: Bug Reporter: Hitesh Shah Assignee: Enis Soztutar hbasemaster:60010/jmx throws an NoSuchMethodError exception -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1621) merge tool should work on online cluster, but disabled table
[ https://issues.apache.org/jira/browse/HBASE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197942#comment-13197942 ] stack commented on HBASE-1621: -- Would you mind uploading a version of online_merge.rb w/ patch included Alexey to make it easier for others using the script. Thank you. merge tool should work on online cluster, but disabled table Key: HBASE-1621 URL: https://issues.apache.org/jira/browse/HBASE-1621 Project: HBase Issue Type: Bug Reporter: ryan rawson Assignee: stack Fix For: 0.94.0 Attachments: 1621-trunk.txt, HBASE-1621-v2.patch, HBASE-1621.patch, hbase-onlinemerge.patch, online_merge.rb taking down the entire cluster to merge 2 regions is a pain, i dont see why the table or regions specifically couldnt be taken offline, then merged then brought back up. this might need a new API to the regionservers so they can take direction from not just the master. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5314) Gracefully rolling restart region servers in rolling-restart.sh
[ https://issues.apache.org/jira/browse/HBASE-5314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198024#comment-13198024 ] stack commented on HBASE-5314: -- This is interesting. You are putting together rolling_restart.sh and graceful_stop.sh. Nice. What version of hbase you working against? Should this new behavior be optional otherwise those who might be using rolling_restart.sh now might get a surprise (IMO, its unlikely anyone is using rolling_restart.sh because it so disruptive.. which you are fixing... but still, I'd think a --graceful option that enables this new facility would be way to go). Patch looks good but for this. Gracefully rolling restart region servers in rolling-restart.sh --- Key: HBASE-5314 URL: https://issues.apache.org/jira/browse/HBASE-5314 Project: HBase Issue Type: Improvement Components: scripts Reporter: YiFeng Jiang Priority: Minor Attachments: HBASE-5314.patch The rolling-restart.sh has a --rs-only option which simply restarts all region servers in the cluster. Consider improve it to gracefully restart region servers to avoid the offline time of the regions deployed on that server, and keep the region distributions same as what it was before the restarting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5286) bin/hbase's logic of adding Hadoop jar files to the classpath is fragile when presented with split packaged Hadoop 0.23 installation
[ https://issues.apache.org/jira/browse/HBASE-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198037#comment-13198037 ] stack commented on HBASE-5286: -- Why do they have to be in a different dir? Can't we do some sed foo to purge them from list of lib jars? If HADOOP_HOME defined, can we not put the hadoop and zk jars at the front of the CLASSPATH or is it that we can't have hadoop jars in the CLASSPATH twice, both the new and old, because we'll find classes we shouldn't or IIRC, we can't have hadoop jars in front of hbase jars because then we'll do things like pick up its webapps instead of ours. Thanks Roman. bin/hbase's logic of adding Hadoop jar files to the classpath is fragile when presented with split packaged Hadoop 0.23 installation Key: HBASE-5286 URL: https://issues.apache.org/jira/browse/HBASE-5286 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Here's the bit from bin/hbase that might need TLC now that Hadoop can be spotted in the wild in split-package configuration: {noformat} #If avail, add Hadoop to the CLASSPATH and to the JAVA_LIBRARY_PATH if [ ! -z $HADOOP_HOME ]; then HADOOPCPPATH= if [ -z $HADOOP_CONF_DIR ]; then HADOOPCPPATH=$(append_path ${HADOOPCPPATH} ${HADOOP_HOME}/conf) else HADOOPCPPATH=$(append_path ${HADOOPCPPATH} ${HADOOP_CONF_DIR}) fi if [ `echo ${HADOOP_HOME}/hadoop-core*.jar` != ${HADOOP_HOME}/hadoop-core*.jar ] ; then HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-core*.jar | head -1`) else HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-common*.jar | head -1`) HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-hdfs*.jar | head -1`) HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-mapred*.jar | head -1`) fi {noformat} There's a couple of issues with the above code: 0. HADOOP_HOME is now deprecated in Hadoop 0.23 1. the list of jar files added to the class-path should be revised 2. we need to figure out a more robust way to get the jar files that are needed to the classpath (things like hadoop-mapred*.jar tend to match src/test jars as well) Better yet, it would be useful to look into whether we can transition HBase's bin/hbase onto using bin/hadoop as a launcher script instead of direct JAVA invocations (Pig, Hive, Sqoop and Mahout already do that) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections
[ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199528#comment-13199528 ] stack commented on HBASE-3792: -- @Bryan Yes please. Thats crazy acrobatics you are at just to close a connection. Thanks. TableInputFormat leaks ZK connections - Key: HBASE-3792 URL: https://issues.apache.org/jira/browse/HBASE-3792 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.1 Environment: Java 1.6.0_24, Mac OS X 10.6.7 Reporter: Bryan Keller Attachments: patch0.90.4, tableinput.patch The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader. The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate. I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199534#comment-13199534 ] stack commented on HBASE-5325: -- bq. Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. Can we go further (smile)? Would be cool if when a regionserver registered, master started listening (or asking) regionservers for their metrics via jmx. Master could use the collected info when it displays its UI (Perhaps it would make sense to do a cluster federated mbean that has the sum of all regionserver jmx metrics and use this displaying master ui (?) plus a master mbean with the master's vitals). If we got this working, using jmx to query vitals, perhaps we could undo the rpc that the regionserver does to the master every second or so to tell it about its current load since HServerLoad is essentially duplicating metrics (Static or near-static properties that HServerLoad reports such as loaded coprocessors could be hoisted up as data in the regionservers ephemeral znode as protobufs/json data -- we could add to whats in HServerLoad reporting system vitals too like RAM, CPUs, Disks. Master could do the same hoisting vitals up into its zk znode... and/or emit in an mbean). Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Bug Reporter: Hitesh Shah Priority: Minor Attachments: HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199542#comment-13199542 ] stack commented on HBASE-5325: -- Can the bean implementation be outside of the master? Just wondering if we can save having HMaster implement yet another Interface (as I'm sure you've noticed, this is a bloated class much in need a refactor; adding more stuff to this class makes the DOR -- Day Of Refactor -- an even longer day. {code} +mxBean = MBeans.register(HBase, HBaseMasterInfo, this); {code} Is this a good name for our bean? Should it be at org.apache.hbase -- I don't remember how you do that -- and the bean name be just Master (I can imagine one day the ServerManager and the AssignmentManager registering mbeans...) On JSON'ing it, is that the way to go? Can we not make MBeans instead -- and MBean per regionserver or a an MBean pointer to list of all registered MBeans (This would be a federated MBean? Pardon me if this is all nonsense, its a while since I've done this stuff). So, if CoProcessors implemented MBean, you could output list of MBeans as result of query on Master getCoProcessors. getDeadRegionServers should return a List of Strings... or whatever the MBean digestible format of an MBean list of Strings is. Similar getRegionsInTransition. Else patch looks great. What you think about the MBean being actionable; i.e. setters so we could change certain values (not regionservers or coprocessors, but maybe future stuff) Patch looks great. Thanks for digging in here. Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Bug Reporter: Hitesh Shah Priority: Minor Attachments: HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5318) Support Eclipse Indigo
[ https://issues.apache.org/jira/browse/HBASE-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199545#comment-13199545 ] stack commented on HBASE-5318: -- Should this get a bit of doc in the reference guide? Around here: http://hbase.apache.org/book.html#eclipse? Good stuff. Support Eclipse Indigo --- Key: HBASE-5318 URL: https://issues.apache.org/jira/browse/HBASE-5318 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.94.0 Environment: Eclipse Indigo (1.4.1) which includes m2eclipse (1.0 SR1). Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Labels: maven Attachments: mvn_HBASE-5318_r0.patch The current 'standard' release of Eclipse (indigo) comes with m2eclipse installed. However, as of m2e v1.0, interesting lifecycle phases are now handled via a 'connector'. However, several of the plugins we use don't support connectors. This means that eclipse bails out and won't build the project or view it as 'working' even though it builds just fine from the the command line. Since Eclipse is one of the major java IDEs and that Indigo has been around for a while, we should make it easy to for new devs to pick up the code and for older devs to upgrade painlessly. The original build should not be modified in any significant way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4336) Convert source tree into maven modules
[ https://issues.apache.org/jira/browse/HBASE-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199551#comment-13199551 ] stack commented on HBASE-4336: -- @Jesse For this kind of work I'd suggest something like restructure101. They are a good crew and have given apache hbase a license; I could ask them for one for you. Using restructure101, you can see our tangled mess in a pretty diagram Then you can try out various refactorings w/o actually changing code to see how viable a refactor. If you want to check it out, come by and we'll sit around a screen for an hour. On your suggestions: {code} I'm proposing to follow two general ideas to resolve this: 1) Abstracting out the static methods into a Util class of the same name (so it would be something like HLogUtils for static method in HLog). {code} ... and then the Util class would be hoisted up into the packages above so it could be used from multiple packages? Master and Regionserver packages need to know about hlog? Should we refactor hlog to its own package? Is that any better? Lots of tools rate complexity by how much interpackage dependency we have going on (I suppose we could have an hlog module in an org.hbase.wal package that the regionserver and master modules depend on but yeah, we'd have a million modules then). {code} 2) Moving super general constants to HConstants (like HFile.DEFAULT_BLOCK_SIZE, which is scattered liberally throughout) and then more specific constants to subclasses within HConstants, meaning you might get something like HConstants.HLog.SPLIT_SKIP_ERRORS_DEFAULT. {code} I'm not a fan of building up HConstants to be this fat Interface refereneced by all and sundry. I'd think that an HFile.DEFAULT_BLOCK_SIZE belongs in HFile. Whats odd is that there are refernces to this constant outside of hfile. Why they reference it? {code} My other idea for (2) was to have a constants package that just has the constants for each class. This second means smaller files, but less easy/immediate/natural access to the constants, but gives you nice separation between the constants. {code} An org.apache.hadoop.hbase.constants? Convert source tree into maven modules -- Key: HBASE-4336 URL: https://issues.apache.org/jira/browse/HBASE-4336 Project: HBase Issue Type: Task Components: build Reporter: Gary Helmling Priority: Critical Fix For: 0.94.0 When we originally converted the build to maven we had a single core module defined, but later reverted this to a module-less build for the sake of simplicity. It now looks like it's time to re-address this, as we have an actual need for modules to: * provide a trimmed down client library that applications can make use of * more cleanly support building against different versions of Hadoop, in place of some of the reflection machinations currently required * incorporate the secure RPC engine that depends on some secure Hadoop classes I propose we start simply by refactoring into two initial modules: * core - common classes and utilities, and client-side code and interfaces * server - master and region server implementations and supporting code This would also lay the groundwork for incorporating the HBase security features that have been developed. Once the module structure is in place, security-related features could then be incorporated into a third module -- security -- after normal review and approval. The security module could then depend on secure Hadoop, without modifying the dependencies of the rest of the HBase code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5328) Small changes to Master to make it more testable
[ https://issues.apache.org/jira/browse/HBASE-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199888#comment-13199888 ] stack commented on HBASE-5328: -- bq. I think MockRegionServer.java should be placed under src/test/java/org/apache/hadoop/hbase/regionserver/ Not yet. Its built for testing the Master (as it says on the class comment). Later when it gets flushed out more it might make sense being moved elsewhere. Thanks for review. Hold off though till its more finished. Good on you Ted. Small changes to Master to make it more testable Key: HBASE-5328 URL: https://issues.apache.org/jira/browse/HBASE-5328 Project: HBase Issue Type: Task Reporter: stack Attachments: 5328.txt Here are some small changes in Master that make it more testable. Included tests stand up a Master and then fake it into thinking that three regionservers are registering making master assign root and meta, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5318) Support Eclipse Indigo
[ https://issues.apache.org/jira/browse/HBASE-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200014#comment-13200014 ] stack commented on HBASE-5318: -- That looks like CLASSPATH messup where hadoop jar is now before hbase jar (classloader is finding a class that is in hadoop and hbase jar in the hadoop jar and then doing subsequent lookups in hadoop jar) Support Eclipse Indigo --- Key: HBASE-5318 URL: https://issues.apache.org/jira/browse/HBASE-5318 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.94.0 Environment: Eclipse Indigo (1.4.1) which includes m2eclipse (1.0 SR1). Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Labels: maven Attachments: mvn_HBASE-5318_r0.patch The current 'standard' release of Eclipse (indigo) comes with m2eclipse installed. However, as of m2e v1.0, interesting lifecycle phases are now handled via a 'connector'. However, several of the plugins we use don't support connectors. This means that eclipse bails out and won't build the project or view it as 'working' even though it builds just fine from the the command line. Since Eclipse is one of the major java IDEs and that Indigo has been around for a while, we should make it easy to for new devs to pick up the code and for older devs to upgrade painlessly. The original build should not be modified in any significant way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202103#comment-13202103 ] stack commented on HBASE-5341: -- bq. This would break the ability to compile HBase 0.92+ against Hadoop releases without security. Perhaps we could entertain breaking this for 0.94.0? i.e. saying we only run on hadoops w/ security? (CDH3 has it? What doesn't that we want to run on by the time 0.94.0 is out). On modularization, yes, if hbase-4336 is done soon, security is a natural. Otherwise, we should do as Enis suggests. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202108#comment-13202108 ] stack commented on HBASE-5325: -- @Enis metrics2 seems a bit out there for us (hadoop 0.23?). We want to run on 0.23 and 1.0 and 2.0, etc., so it'd be a while before we could lean on it. metrics2 has facility that would help? (I've not studied it). @Hitesh Regards I am still digging into jmx internals but I could not find anything which mentions it as an option for pushing information., even if there was a means (IIRC there is but am likely off), I think we'd have master pulling. bq. Having the master pull information from all region servers using jmx ( or any other point to point protocol ) would likely be a bad idea from a performance point of view. Currently every regionserver sends status every (configurable) second. Its a fat Writable serialization of each regionservers counters and current state. IIRC, this mechanism runs mostly independent and beside our metrics (so there'll be Writable serialization of regionstate and if something like tsdb is running, there'll be a JMX serialization of server stating happening too). Would be an improvement if we did metrics reporting one way only if possible. bq. Also, was your intention to have the HMaster be a metric aggregator for the RegionServers' metrics? It does this now for key stats. bq. I still need to look at nesting of mbeans from various components and also need to look at the hbase code in more detail to see what kind of management options could be exposed via jmx. I'd be interested in what you think. We need to figure being able to config a running cluster; i.e. change Configuration values and have hbase notice. Having this go via jmx would likely be like taking the 'killarney road to dingle' as my grandma used to say (its shorter if you take the tralee road) so maybe jmx is read-only rather than 'management'. Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache
[ https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202898#comment-13202898 ] stack commented on HBASE-5347: -- Very nice idea (if the reference counting can be figured). GC free memory management in Level-1 Block Cache Key: HBASE-5347 URL: https://issues.apache.org/jira/browse/HBASE-5347 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani On eviction of a block from the block-cache, instead of waiting for the garbage collecter to reuse its memory, reuse the block right away. This will require us to keep reference counts on the HFile blocks. Once we have the reference counts in place we can do our own simple blocks-out-of-slab allocation for the block-cache. This will help us with * reducing gc pressure, especially in the old generation * making it possible to have non-java-heap memory backing the HFile blocks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202908#comment-13202908 ] stack commented on HBASE-5341: -- @Enis 0.92 won't be modularized. HBASE-5288 will be fixed in 0.92.1. Regards getting security artifacts into maven, I'm not sure how I'd do that. I suppose I'd do everything w/ the security profile. Will try it (Thats going to be fun w/ our build taking two hours and mvn rebuilding 4 times I believe per artifact before I the artifact gets hoisted to apache staging). HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5348) Constraint configuration loaded with bloat
[ https://issues.apache.org/jira/browse/HBASE-5348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202916#comment-13202916 ] stack commented on HBASE-5348: -- Why don't you want to load the 'default' config? What do you get if you don't load *.xml files? Constraint configuration loaded with bloat -- Key: HBASE-5348 URL: https://issues.apache.org/jira/browse/HBASE-5348 Project: HBase Issue Type: Bug Components: coprocessors, regionserver Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Attachments: java_HBASE-5348.patch, java_HBASE-5348.patch Constraints load the configuration but don't load the 'correct' configuration, but instead instantiate the default configuration (via new Configuration). It should just be Configuration(false) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache
[ https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202919#comment-13202919 ] stack commented on HBASE-5347: -- Blocks are not all of the same size. Will this be an issue? Blocks of an awkward size -- say a block that happen to have a massive KeyValue in them and they exceed massively the default block size -- would need to be treated differently? GC free memory management in Level-1 Block Cache Key: HBASE-5347 URL: https://issues.apache.org/jira/browse/HBASE-5347 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani On eviction of a block from the block-cache, instead of waiting for the garbage collecter to reuse its memory, reuse the block right away. This will require us to keep reference counts on the HFile blocks. Once we have the reference counts in place we can do our own simple blocks-out-of-slab allocation for the block-cache. This will help us with * reducing gc pressure, especially in the old generation * making it possible to have non-java-heap memory backing the HFile blocks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5348) Constraint configuration loaded with bloat
[ https://issues.apache.org/jira/browse/HBASE-5348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202946#comment-13202946 ] stack commented on HBASE-5348: -- @Jesse Ok. Make a patch w/ the little paragraph as a comment on why you pass false and I'll +1 it (passing a boolean in to a Configuration constructor is not done elsewhere in hbase codebase that I know of so it needs a little explaination). Thanks. Constraint configuration loaded with bloat -- Key: HBASE-5348 URL: https://issues.apache.org/jira/browse/HBASE-5348 Project: HBase Issue Type: Bug Components: coprocessors, regionserver Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Attachments: java_HBASE-5348.patch, java_HBASE-5348.patch Constraints load the configuration but don't load the 'correct' configuration, but instead instantiate the default configuration (via new Configuration). It should just be Configuration(false) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202951#comment-13202951 ] stack commented on HBASE-5341: -- Will do. Let me peg this issue against 0.92.1 so we don't forget it. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.1 Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5350) Fix jamon generated package names
[ https://issues.apache.org/jira/browse/HBASE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202966#comment-13202966 ] stack commented on HBASE-5350: -- If you run hbase, the webui comes up fine? Otherwise patch lgtm. Fix jamon generated package names - Key: HBASE-5350 URL: https://issues.apache.org/jira/browse/HBASE-5350 Project: HBase Issue Type: Bug Components: monitoring Affects Versions: 0.92.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.94.0 Attachments: jamon_HBASE-5350.patch Previously, jamon was creating the template files in org.apache.hbase, but it should be org.apache.hadoop.hbase, so it's in line with rest of source files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5348) Constraint configuration loaded with bloat
[ https://issues.apache.org/jira/browse/HBASE-5348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202971#comment-13202971 ] stack commented on HBASE-5348: -- I see that now. Agree it would be overkill. Let me commit. Constraint configuration loaded with bloat -- Key: HBASE-5348 URL: https://issues.apache.org/jira/browse/HBASE-5348 Project: HBase Issue Type: Bug Components: coprocessors, regionserver Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Attachments: java_HBASE-5348.patch, java_HBASE-5348.patch Constraints load the configuration but don't load the 'correct' configuration, but instead instantiate the default configuration (via new Configuration). It should just be Configuration(false) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5346) Fix testColumnFamilyCompression and test_TIMERANGE in TestHFileOutputFormat
[ https://issues.apache.org/jira/browse/HBASE-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202986#comment-13202986 ] stack commented on HBASE-5346: -- Patch looks good to me. Let me submit the patch to see if it breaks when we run on 1.0.0 hadoop. Fix testColumnFamilyCompression and test_TIMERANGE in TestHFileOutputFormat Key: HBASE-5346 URL: https://issues.apache.org/jira/browse/HBASE-5346 Project: HBase Issue Type: Sub-task Components: mapreduce, test Affects Versions: 0.90.4, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5346-v0.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92 (for testColumnFamilyCompression and test_TIMERANGE): Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) The problem is that these tests make incorrect assumptions about the output of mapreduce jobs. Prior to 0.23, temporary data was in, for example: ./_temporary/_attempt___r_00_0/b/1979617994050536795 Now that has changed. The correct way to get that path is based on getDefaultWorkFile. Also, the data is not moved into the outputPath until both the Task and Job are committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5335) Dynamic Schema Configurations
[ https://issues.apache.org/jira/browse/HBASE-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203114#comment-13203114 ] stack commented on HBASE-5335: -- bq. Combined with online schema change, this will allow us to safely iterate on configuration settings. So, configuration change would come in via online schema change rather than say, via zookeeper callback? The former would be heavyweight compared (closing and reopening regions which seems overkill for say, a change in flush size). Scoping to table and store also makes it so this scheme won't work for configurations that are not table nor store. Dynamic Schema Configurations - Key: HBASE-5335 URL: https://issues.apache.org/jira/browse/HBASE-5335 Project: HBase Issue Type: New Feature Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Labels: configuration, schema Currently, the ability for a core developer to add per-table per-CF configuration settings is very heavyweight. You need to add a reserved keyword all the way up the stack you have to support this variable long-term if you're going to expose it explicitly to the user. This has ended up with using Configuration.get() a lot because it is lightweight and you can tweak settings while you're trying to understand system behavior [since there are many config params that may never need to be tuned]. We need to add the ability to put read arbitrary KV settings in the HBase schema. Combined with online schema change, this will allow us to safely iterate on configuration settings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5221) bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout
[ https://issues.apache.org/jira/browse/HBASE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203273#comment-13203273 ] stack commented on HBASE-5221: -- So idea is to put hadoop jars under /share? Where does this notion come from? (Pardon my ignorance) bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout Key: HBASE-5221 URL: https://issues.apache.org/jira/browse/HBASE-5221 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jimmy Xiang Attachments: hbase-5221.txt Running against an 0.24.0-SNAPSHOT hadoop: ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-common*.jar: No such file or directory ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-hdfs*.jar: No such file or directory ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-mapred*.jar: No such file or directory The jars are rooted deeper in the heirarchy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5335) Dynamic Schema Configurations
[ https://issues.apache.org/jira/browse/HBASE-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203269#comment-13203269 ] stack commented on HBASE-5335: -- bq ...but you'd need to do a large refactor to handle truly online configuration transitions. What if we just did the 'important' ones? On the locking primitives, if a long or something, we could just have the config. volatile? Would be more costly than a final long but could make a local copy per method invocation I suppose. bq. Note that, with your region server draining, you can change global config values in an online fashion on a per-server basis by issuing a 'regionserver restart --draining' or something like that. We can do that now, right? Make the change then do our rolling restart with graceful shedding of regions from the server, and then graceful replacement. The addition of the HBaseTableConfiguration and HBaseStoreConfiguration would make it all a little nicer. I'm interested in how the HBaseStoreConfiguration configs make it into the schema and then get undone on the other side (You could ask me how a config. change in a zk callback makes it up into a regionserver Configuration instance... not sure) bq. ...have you done a lot of testing with killing ZKQuorumPeers on clusters with a lot of regions Haven't done any. Sounds like we should? Does the ensemble have to have a bunch of znodes afloat for you to see the spikes or is it just catching up the restarted peers state regardless of the particular znode loading at the time? Dynamic Schema Configurations - Key: HBASE-5335 URL: https://issues.apache.org/jira/browse/HBASE-5335 Project: HBase Issue Type: New Feature Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Labels: configuration, schema Currently, the ability for a core developer to add per-table per-CF configuration settings is very heavyweight. You need to add a reserved keyword all the way up the stack you have to support this variable long-term if you're going to expose it explicitly to the user. This has ended up with using Configuration.get() a lot because it is lightweight and you can tweak settings while you're trying to understand system behavior [since there are many config params that may never need to be tuned]. We need to add the ability to put read arbitrary KV settings in the HBase schema. Combined with online schema change, this will allow us to safely iterate on configuration settings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well
[ https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203277#comment-13203277 ] stack commented on HBASE-5292: -- This patch is for trunk? You going to commit Mikhail? Seems like TestReplication failed only which is probably unrelated? getsize per-CF metric incorrectly counts compaction related reads as well -- Key: HBASE-5292 URL: https://issues.apache.org/jira/browse/HBASE-5292 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100924 Reporter: Kannan Muthukkaruppan Attachments: 0001-jira-HBASE-5292-Prevent-counting-getSize-on-compacti.patch, D1527.1.patch, D1527.2.patch, D1527.3.patch, D1527.4.patch, D1617.1.patch The per-CF getsize metric's intent was to track bytes returned (to HBase clients) per-CF. [Note: We already have metrics to track # of HFileBlock's read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt vs. fsblockreadcnt.] Currently, the getsize metric gets updated for both client initiated Get/Scan operations as well for compaction related reads. The metric is updated in StoreScanner.java:next() when the Scan query matcher returns an INCLUDE* code via a: HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength()); We should not do the above in case of compactions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5336) Spurious exceptions in HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203314#comment-13203314 ] stack commented on HBASE-5336: -- Are we calling a sync before a new WAL initialized? Spurious exceptions in HConnectionImplementation Key: HBASE-5336 URL: https://issues.apache.org/jira/browse/HBASE-5336 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl I have seen this on the client a few time during heave write testing: java.util.concurrent.ExecutionException: java.io.IOException: java.io.IOException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1524) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1376) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:891) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:743) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:730) at NewsFeedCreate.insert(NewsFeedCreate.java:91) at NewsFeedCreate$1.run(NewsFeedCreate.java:38) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: java.io.IOException: java.lang.NullPointerException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.client.ServerCallable.translateException(ServerCallable.java:228) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:212) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1360) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1348) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ... 1 more Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:243) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1289) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1386) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2161) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1954) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3363) at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1353) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1351) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) ... 7 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA,
[jira] [Commented] (HBASE-5221) bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout
[ https://issues.apache.org/jira/browse/HBASE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203687#comment-13203687 ] stack commented on HBASE-5221: -- No worries Jimmy. That'll do. Thanks. Let me commit. bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout Key: HBASE-5221 URL: https://issues.apache.org/jira/browse/HBASE-5221 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jimmy Xiang Attachments: hbase-5221.txt Running against an 0.24.0-SNAPSHOT hadoop: ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-common*.jar: No such file or directory ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-hdfs*.jar: No such file or directory ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-mapred*.jar: No such file or directory The jars are rooted deeper in the heirarchy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3537) [site] Make it so each page of manual allows users comment like mysql's manual does
[ https://issues.apache.org/jira/browse/HBASE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203785#comment-13203785 ] stack commented on HBASE-3537: -- I don't seem to have filled in here that I couldn't figure how to make facebook comments include the page the comment was made on. Here is another http://disqus.com/ we might embed. [site] Make it so each page of manual allows users comment like mysql's manual does --- Key: HBASE-3537 URL: https://issues.apache.org/jira/browse/HBASE-3537 Project: HBase Issue Type: Improvement Reporter: stack I like the way the mysql manuals allow users comment, improve or correct mysql manual pages. We should have same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5221) bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout
[ https://issues.apache.org/jira/browse/HBASE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203801#comment-13203801 ] stack commented on HBASE-5221: -- @Roman Thanks for catching this. bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout Key: HBASE-5221 URL: https://issues.apache.org/jira/browse/HBASE-5221 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jimmy Xiang Fix For: 0.94.0 Attachments: hbase-5221.txt Running against an 0.24.0-SNAPSHOT hadoop: ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-common*.jar: No such file or directory ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-hdfs*.jar: No such file or directory ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-mapred*.jar: No such file or directory The jars are rooted deeper in the heirarchy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4762) ROOT and META region never be assigned if IOE throws in verifyRootRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203811#comment-13203811 ] stack commented on HBASE-4762: -- This should be less likely in TRUNK, right RAM?, since we retry in TRUNK but not in 0.90.x ROOT and META region never be assigned if IOE throws in verifyRootRegionLocation Key: HBASE-4762 URL: https://issues.apache.org/jira/browse/HBASE-4762 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: mingjian Assignee: mingjian Fix For: 0.90.7 Patch in HBASE-3914 fixed root assigned in two regionservers. But it seemed like root region will never be assigned if verifyRootRegionLocation throws IOE. Like following master logs: {noformat} 2011-10-19 19:13:34,873 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_META_SERVER_S HUTDOWN org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1090) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:256) at $Proxy7.getRegionInfo(Unknown Source) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:471) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:90) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:126) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} After this, -ROOT-'s region won't be assigned, like this: {noformat} 2011-10-19 19:18:40,000 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: locateRegionInMeta parent Table=-ROOT-, metaLocation=address: dw79.kgb.sqa.cm4:60020, regioninfo: -ROOT-,,0.70236052, attempt=0 of 10 failed; retrying after s leep of 1000 because: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: -ROOT-,,0 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2771) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1802) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:569) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1091) {noformat} So we should rewrite the verifyRootRegionLocation method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5353) HA/Distributed HMaster via RegionServers
[ https://issues.apache.org/jira/browse/HBASE-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203872#comment-13203872 ] stack commented on HBASE-5353: -- I'd say just run the master in-process w/ the regionserver. Master doesn't do much (It used to be heavily loaded when we did log splitting but thats distributed now or on startup... but even then, should be fine). Client already tracks master location as you say though we need to undo this...and just have the client do a read of zk to find master location when it needs it. Regards UI, we'd collapse it so that there'd be a single webapp rather than the two we have now. There'd be a 'master' link. If the current regionserver were not the master, the master link would redirect you to current master. HA/Distributed HMaster via RegionServers Key: HBASE-5353 URL: https://issues.apache.org/jira/browse/HBASE-5353 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.94.0 Reporter: Jesse Yates Priority: Minor Currently, the HMaster node must be considered a 'special' node (single point of failure), meaning that the node must be protected more than the other commodity machines. It should be possible to instead have the HMaster be much more available, either in a distributed sense (meaning a bit rewrite) or with multiple instances and automatic failover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203921#comment-13203921 ] stack commented on HBASE-5270: -- I was taking a look through HBASE-5179 and HBASE-4748 again, the two issues that spawned this one (Both are in synopsis about master failover with concurrent servershutdown handler running). I have also been looking at HBASE-5344 [89-fb] Scan unassigned region directory on master failover. HBASE-5179 starts out as we can miss edits if a server is discovered to be dead AFTER master failover has started up splitting logs because we'll notice it dead so will assign out its regions but before we've had a chance to split its logs. The way fb deal with this in hbase-5344 is not to process zookeeper events that come in during master failover. They queue them instead and only start in on the processing after master is up. Chunhui does something like this in his original patch by adding any server currently being processed by server shutdown to the list of regionservers whose logs we should not split. The fb way of halting temporarily the callback processing seems more airtight. HBASE-5179 is then extended to include as in scope, the processing of servers carrying root and meta (hbase-4748) that crash during master failover. We need to consider the cases where a server crashes AFTER master failover distributed log splitting has started but before we run the verifications of meta and root locations. Currently we'll expire the server that is unresponsive when we go to verify root and meta locations. The notion is that the meta regions will be assigned by the server shutdown handler. The fb technique of turning off processing zk events would mess with our existing handling code here -- but I'm not too confident the code is going to do the right thing since it has no tests of this predicament and the scenarios look like they could be pretty varied (root is offline only, meta server has crashed only, a server with both root and meta has crashed, etc). In hbase-5344, fb will go query each regionserver for the regions its currently hosting (and look in zk to see what rs are up). Maybe we need some of this from 89-fb in trunk but I'm not clear on it just yet; would need more study of the current state of trunk and then of what is happening over in 89-fb. One thing I think we should do to lessen the number of code paths we can take on failover is to do the long-talked of purge of the root region. This should cut down on the number of states we need to deal with and make reasoning about failure states on failover easier to reason about. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203923#comment-13203923 ] stack commented on HBASE-5270: -- HBASE-3171 is the issue to purge root. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5353) HA/Distributed HMaster via RegionServers
[ https://issues.apache.org/jira/browse/HBASE-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204113#comment-13204113 ] stack commented on HBASE-5353: -- bq. Except it opens a new can of worms: where do you find the master UI? how do you monitor your master if it moves around? how do you easily find the master logs when it could be anywhere in the cluster? Its not a new can of worms, right? We have the above (mostly unsolved) problems now if you run with more than one master. bq. And any cron jobs or nagios alerts you write need to first call some HBase utility to find the active master's IP via ZK in order to get to it? They should be doing this now, if multiple masters? If the master function were lightweight enough, it'd be kinda sweet having one daemon type only I'd think; there'd be no longer need for special treatment of master. Might be tricky having them running in the same JVM what w/ all the executors afloat and RPCs (I'd rather do all in the one JVM then have RS start/stop separate Master processes if we were going to go this route). HA/Distributed HMaster via RegionServers Key: HBASE-5353 URL: https://issues.apache.org/jira/browse/HBASE-5353 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.94.0 Reporter: Jesse Yates Priority: Minor Currently, the HMaster node must be considered a 'special' node (single point of failure), meaning that the node must be protected more than the other commodity machines. It should be possible to instead have the HMaster be much more available, either in a distributed sense (meaning a bit rewrite) or with multiple instances and automatic failover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5353) HA/Distributed HMaster via RegionServers
[ https://issues.apache.org/jira/browse/HBASE-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204268#comment-13204268 ] stack commented on HBASE-5353: -- bq. ...you only need to look two places if you have an issue. If you have no idea where the master is, you have to hunt around the cluster to find it. I'd imagine it'd be hard getting this patch in if no idea where the master is (And, again, don't we have this problem now if you start up three masters and one fails? You have to hunt around. We need to build the redirect piece regardless such as a link to master on each server page which redirects to current master and such as a history of who was master when in zk). You could even make the combined master+regionserver daemon work like our current multimaster system by having there be affinity for a certain set of servers. What kind of nagios alerts would be master particular? We need to add indirection to these now anyways -- ask zk who the master is -- if more than one master running. Metrics could be a little complicated especially if master moved servers over the period of interest but generally aren't master metrics of less interest since they are generally just aggregates and ganglia or opentsdb do it better job of this anyways? Logs don't have to be interleaved. Thats just a bit of log4j config? Yes, could be issue if the daemon is bogged down. The master would be less responsive which should be fine for short periods but if sustained it could be issue. I'm not going to work on this. I do see it as something that could simplify our deploy story. HA/Distributed HMaster via RegionServers Key: HBASE-5353 URL: https://issues.apache.org/jira/browse/HBASE-5353 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.94.0 Reporter: Jesse Yates Priority: Minor Currently, the HMaster node(s) must be considered a 'special' node (though not a single point of failover), meaning that the node must be protected more than the other cluster machines or at least specially monitored. Minimally, we always need to ensure that the master is running, rather than letting the system handle that internally. It should be possible to instead have the HMaster be much more available, either in a distributed sense (meaning a bit rewrite) or multiple, dynamically created instances combined with the hot fail-over of masters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5355) Compressed RPC's for HBase
[ https://issues.apache.org/jira/browse/HBASE-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204280#comment-13204280 ] stack commented on HBASE-5355: -- Could we do same prefix compression on the wire? Todd in your proposal above we'd convert KVs to your new form before putting payload on wire and then undo it on other side? Do we care about latency here? Long time back Ryan tried to a custom compression before putting stuff on the wire and then undoing it on other end hoping it would help some w/ latency but he found it upped latency. I should dig up a pointer (could be how he went about it). Compressed RPC's for HBase -- Key: HBASE-5355 URL: https://issues.apache.org/jira/browse/HBASE-5355 Project: HBase Issue Type: Improvement Components: ipc Affects Versions: 0.89.20100924 Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Some application need ability to do large batched writes and reads from a remote MR cluster. These eventually get bottlenecked on the network. These results are also pretty compressible sometimes. The aim here is to add the ability to do compressed calls to the server on both the send and receive paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5327) Print a message when an invalid hbase.rootdir is passed
[ https://issues.apache.org/jira/browse/HBASE-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204287#comment-13204287 ] stack commented on HBASE-5327: -- The difference is that we get a nicer message on master crash out? Its about the bad rootdir passed rather than some complaint about a URI parse? If so, good enough for me. +1. Print a message when an invalid hbase.rootdir is passed --- Key: HBASE-5327 URL: https://issues.apache.org/jira/browse/HBASE-5327 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: Jean-Daniel Cryans Assignee: Jimmy Xiang Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: hbase-5327.txt As seen on the mailing list: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/24124 If hbase.rootdir doesn't specify a folder on hdfs we crash while opening a path to .oldlogs: {noformat} 2012-02-02 23:07:26,292 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://sv4r11s38:9100.oldlogs at org.apache.hadoop.fs.Path.initialize(Path.java:148) at org.apache.hadoop.fs.Path.init(Path.java:71) at org.apache.hadoop.fs.Path.init(Path.java:50) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:112) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs://sv4r11s38:9100.oldlogs at java.net.URI.checkPath(URI.java:1787) at java.net.URI.init(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:145) ... 6 more {noformat} It could also crash anywhere else, this just happens to be the first place we use hbase.rootdir. We need to verify that it's an actual folder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3171) Drop ROOT and instead store META location(s) directly in ZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204307#comment-13204307 ] stack commented on HBASE-3171: -- I took a look at what is involved here. I first started out just hacking -ROOT- out of hbase completely. It was fun while it lasted (The root location in zk would instead become the meta location w/ a watcher, etc.). On my bike ride home I figured it'd be better if I could make it so old clients looking for a -ROOT- no longer there would still sort of work. I dug around but was unable to figure a way of making this work, not w/o leaving a -ROOT- placeholder in place and then adding interception code that would take a different route if we were scanning the now non-existent to find the meta location now up in zk rather than in a root table; in other words, adding more, cryptic code. It looked to get ugly fast too. So, I see no way of doing this w/o breaking backward compatibility. Old code would go looking for a non-existent -ROOT- region. Do we want to do this? Drop ROOT and instead store META location(s) directly in ZooKeeper -- Key: HBASE-3171 URL: https://issues.apache.org/jira/browse/HBASE-3171 Project: HBase Issue Type: Improvement Components: client, master, regionserver, zookeeper Reporter: Jonathan Gray Rather than storing the ROOT region location in ZooKeeper, going to ROOT, and reading the META location, we should just store the META location directly in ZooKeeper. The purpose of the root region from the bigtable paper was to support multiple meta regions. Currently, we explicitly only support a single meta region, so the translation from our current code of a single root location to a single meta location will be very simple. Long-term, it seems reasonable that we could store several meta region locations in ZK. There's been some discussion in HBASE-1755 about actually moving META into ZK, but I think this jira is a good step towards taking some of the complexity out of how we have to deal with catalog tables everywhere. As-is, a new client already requires ZK to get the root location, so this would not change those requirements in any way. The primary motivation for this is to simplify things like CatalogTracker. The way we can handle root in that class is really simple but the tracking of meta is difficulty and a bit hacky. This hack on tracking of the meta location is what caused one of the bugs over in HBASE-3159. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5363) Add rat check to run automatically on mvn build.
[ https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204671#comment-13204671 ] stack commented on HBASE-5363: -- We already have this. See our pom: {code} plugin groupIdorg.apache.rat/groupId artifactIdapache-rat-plugin/artifactId version0.7/version configuration excludes exclude**/.*/exclude exclude**/target/**/exclude exclude**/CHANGES.txt/exclude exclude**/CHANGES.txt/exclude exclude**/generated/**/exclude exclude**/conf/*/exclude exclude**/*.avpr/exclude exclude**/control/exclude exclude**/conffile/exclude exclude**/8e8ab58dcf39412da19833fcd8f687ac/exclude !--It don't like freebsd license-- excludesrc/site/resources/css/freebsd_docbook.css/exclude /excludes /configuration /plugin {code} And see the generated report: http://hbase.apache.org/rat-report.html Here is the issue where I messed w/ this stuff on trunk: HBASE-4647 Add rat check to run automatically on mvn build. Key: HBASE-5363 URL: https://issues.apache.org/jira/browse/HBASE-5363 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.90.5, 0.92.0 Reporter: Jonathan Hsieh Some of the recent hbase release failed rat checks (mvn rat:check). We should add checks likely in the mvn package phase so that this becomes a non-issue in the future. Here's an example from Whirr: https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5327) Print a message when an invalid hbase.rootdir is passed
[ https://issues.apache.org/jira/browse/HBASE-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204722#comment-13204722 ] stack commented on HBASE-5327: -- +1 Print a message when an invalid hbase.rootdir is passed --- Key: HBASE-5327 URL: https://issues.apache.org/jira/browse/HBASE-5327 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: Jean-Daniel Cryans Assignee: Jimmy Xiang Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: hbase-5327.txt, hbase-5327_v2.txt As seen on the mailing list: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/24124 If hbase.rootdir doesn't specify a folder on hdfs we crash while opening a path to .oldlogs: {noformat} 2012-02-02 23:07:26,292 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://sv4r11s38:9100.oldlogs at org.apache.hadoop.fs.Path.initialize(Path.java:148) at org.apache.hadoop.fs.Path.init(Path.java:71) at org.apache.hadoop.fs.Path.init(Path.java:50) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:112) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs://sv4r11s38:9100.oldlogs at java.net.URI.checkPath(URI.java:1787) at java.net.URI.init(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:145) ... 6 more {noformat} It could also crash anywhere else, this just happens to be the first place we use hbase.rootdir. We need to verify that it's an actual folder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5363) Add rat check to run automatically on mvn build.
[ https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204723#comment-13204723 ] stack commented on HBASE-5363: -- The rat plugin is pretty crappy in my experience too. Its hard to get it behave. Add rat check to run automatically on mvn build. Key: HBASE-5363 URL: https://issues.apache.org/jira/browse/HBASE-5363 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.90.5, 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Some of the recent hbase release failed rat checks (mvn rat:check). We should add checks likely in the mvn package phase so that this becomes a non-issue in the future. Here's an example from Whirr: https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5365) [book] adding description of compaction file selection to refGuide in Arch/Regions/Store
[ https://issues.apache.org/jira/browse/HBASE-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204744#comment-13204744 ] stack commented on HBASE-5365: -- This doc patch is excellent. I thought default flush size in trunk 128M, not (134 mb)? Its done twice. What is '(e.g., 10F)'? Is that float? [book] adding description of compaction file selection to refGuide in Arch/Regions/Store Key: HBASE-5365 URL: https://issues.apache.org/jira/browse/HBASE-5365 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Attachments: docbkx_hbase_5365.patch book.xml * adding description of compaction selection algorithm with examples (based on existing unit tests) * also added a few links to the compaction section from other places in the book that already mention compaction. configuration.xml * added link to compaction section from the entry that discusses configuring major compaction interval. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2375) Make decision to split based on aggregate size of all StoreFiles and revisit related config params
[ https://issues.apache.org/jira/browse/HBASE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204779#comment-13204779 ] stack commented on HBASE-2375: -- Doing first bullet point only sounds good. Lets file issues for the split other suggestions. What about the other recommendations made up in the issue regards compactionThreshold. Upping compactionThreshold from 3 to 5 where 5 is than the number of flushes it would take to make us splittable; i.e. the intent is no compaction before first split. Should we do this too as part of this issue? We could make our flush size 256M and compactionThreshold 5. Or perhaps thats too rad (thats a big Map to be carrying around)? Instead up the compactionThreshold and down the default regionsize from 1G to 512M and keep flush at 128M? I took a look at patch and its pretty stale now given changes that have gone in since. Make decision to split based on aggregate size of all StoreFiles and revisit related config params -- Key: HBASE-2375 URL: https://issues.apache.org/jira/browse/HBASE-2375 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.20.3 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Labels: moved_from_0_20_5 Attachments: HBASE-2375-v8.patch Currently we will make the decision to split a region when a single StoreFile in a single family exceeds the maximum region size. This issue is about changing the decision to split to be based on the aggregate size of all StoreFiles in a single family (but still not aggregating across families). This would move a check to split after flushes rather than after compactions. This issue should also deal with revisiting our default values for some related configuration parameters. The motivating factor for this change comes from watching the behavior of RegionServers during heavy write scenarios. Today the default behavior goes like this: - We fill up regions, and as long as you are not under global RS heap pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) StoreFiles. - After we get 3 StoreFiles (hbase.hstore.compactionThreshold) we trigger a compaction on this region. - Compaction queues notwithstanding, this will create a 192MB file, not triggering a split based on max region size (hbase.hregion.max.filesize). - You'll then flush two more 64MB MemStores and hit the compactionThreshold and trigger a compaction. - You end up with 192 + 64 + 64 in a single compaction. This will create a single 320MB and will trigger a split. - While you are performing the compaction (which now writes out 64MB more than the split size, so is about 5X slower than the time it takes to do a single flush), you are still taking on additional writes into MemStore. - Compaction finishes, decision to split is made, region is closed. The region now has to flush whichever edits made it to MemStore while the compaction ran. This flushing, in our tests, is by far the dominating factor in how long data is unavailable during a split. We measured about 1 second to do the region closing, master assignment, reopening. Flushing could take 5-6 seconds, during which time the region is unavailable. - The daughter regions re-open on the same RS. Immediately when the StoreFiles are opened, a compaction is triggered across all of their StoreFiles because they contain references. Since we cannot currently split a split, we need to not hang on to these references for long. This described behavior is really bad because of how often we have to rewrite data onto HDFS. Imports are usually just IO bound as the RS waits to flush and compact. In the above example, the first cell to be inserted into this region ends up being written to HDFS 4 times (initial flush, first compaction w/ no split decision, second compaction w/ split decision, third compaction on daughter region). In addition, we leave a large window where we take on edits (during the second compaction of 320MB) and then must make the region unavailable as we flush it. If we increased the compactionThreshold to be 5 and determined splits based on aggregate size, the behavior becomes: - We fill up regions, and as long as you are not under global RS heap pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) StoreFiles. - After each MemStore flush, we calculate the aggregate size of all StoreFiles. We can also check the compactionThreshold. For the first three flushes, both would not hit the limit. On the fourth flush, we would see total aggregate size = 256MB and determine to make a split.
[jira] [Commented] (HBASE-5367) [book] small formatting changes to compaction description in Arch/Regions/Store
[ https://issues.apache.org/jira/browse/HBASE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204782#comment-13204782 ] stack commented on HBASE-5367: -- In trunk this is 128M: {code} codehbase.hregion.memstore.flush.size/code (64 mb). /listitem {code} {code} namehbase.hregion.memstore.flush.size/name value134217728/value description {code} [book] small formatting changes to compaction description in Arch/Regions/Store --- Key: HBASE-5367 URL: https://issues.apache.org/jira/browse/HBASE-5367 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_hbase_5367.xml.patch Fixing a few small-but-important things that came out of a post-commit comment in HBASE-5365 book.xml * corrected default region flush size (it's actually 64mb) * removed trailing 'F' in a ratio discussion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5367) [book] small formatting changes to compaction description in Arch/Regions/Store
[ https://issues.apache.org/jira/browse/HBASE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204800#comment-13204800 ] stack commented on HBASE-5367: -- {code} long flushSize = this.htableDescriptor.getMemStoreFlushSize(); if (flushSize == HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE) { flushSize = conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE, HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE); } {code} So, looks like DEFAULT_MEMSTORE_FLUSH_SIZE is 64M which is confusing and we'll use whats in HTD IFF its different from this default. Yeah, easy to get confused. [book] small formatting changes to compaction description in Arch/Regions/Store --- Key: HBASE-5367 URL: https://issues.apache.org/jira/browse/HBASE-5367 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_hbase_5367.xml.patch, book_hbase_5367_2.xml.patch Fixing a few small-but-important things that came out of a post-commit comment in HBASE-5365 book.xml * corrected default region flush size (it's actually 64mb) * removed trailing 'F' in a ratio discussion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira