[jira] [Commented] (HBASE-5367) [book] small formatting changes to compaction description in Arch/Regions/Store
[ https://issues.apache.org/jira/browse/HBASE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204802#comment-13204802 ] stack commented on HBASE-5367: -- {code} long flushSize = this.htableDescriptor.getMemStoreFlushSize(); if (flushSize == HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE) { flushSize = conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE, HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE); } {code} So, looks like DEFAULT_MEMSTORE_FLUSH_SIZE is 64M which is confusing and we'll use whats in HTD IFF its different from this default. Yeah, easy to get confused. [book] small formatting changes to compaction description in Arch/Regions/Store --- Key: HBASE-5367 URL: https://issues.apache.org/jira/browse/HBASE-5367 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_hbase_5367.xml.patch, book_hbase_5367_2.xml.patch Fixing a few small-but-important things that came out of a post-commit comment in HBASE-5365 book.xml * corrected default region flush size (it's actually 64mb) * removed trailing 'F' in a ratio discussion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5367) [book] small formatting changes to compaction description in Arch/Regions/Store
[ https://issues.apache.org/jira/browse/HBASE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204801#comment-13204801 ] stack commented on HBASE-5367: -- {code} long flushSize = this.htableDescriptor.getMemStoreFlushSize(); if (flushSize == HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE) { flushSize = conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE, HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE); } {code} So, looks like DEFAULT_MEMSTORE_FLUSH_SIZE is 64M which is confusing and we'll use whats in HTD IFF its different from this default. Yeah, easy to get confused. [book] small formatting changes to compaction description in Arch/Regions/Store --- Key: HBASE-5367 URL: https://issues.apache.org/jira/browse/HBASE-5367 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_hbase_5367.xml.patch, book_hbase_5367_2.xml.patch Fixing a few small-but-important things that came out of a post-commit comment in HBASE-5365 book.xml * corrected default region flush size (it's actually 64mb) * removed trailing 'F' in a ratio discussion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5367) [book] small formatting changes to compaction description in Arch/Regions/Store
[ https://issues.apache.org/jira/browse/HBASE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204803#comment-13204803 ] stack commented on HBASE-5367: -- {code} long flushSize = this.htableDescriptor.getMemStoreFlushSize(); if (flushSize == HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE) { flushSize = conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE, HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE); } {code} So, looks like DEFAULT_MEMSTORE_FLUSH_SIZE is 64M which is confusing and we'll use whats in HTD IFF its different from this default. Yeah, easy to get confused. [book] small formatting changes to compaction description in Arch/Regions/Store --- Key: HBASE-5367 URL: https://issues.apache.org/jira/browse/HBASE-5367 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_hbase_5367.xml.patch, book_hbase_5367_2.xml.patch Fixing a few small-but-important things that came out of a post-commit comment in HBASE-5365 book.xml * corrected default region flush size (it's actually 64mb) * removed trailing 'F' in a ratio discussion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2375) Make decision to split based on aggregate size of all StoreFiles and revisit related config params
[ https://issues.apache.org/jira/browse/HBASE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204820#comment-13204820 ] stack commented on HBASE-2375: -- bq. Upping compactionThreshold from 3 to 5... Sounds like a change that can have a bigger impact but that mostly helps this specific use case... Dunno. 3 strikes me as one of those decisions that made sense long time ago but a bunch has changed since... We should test it I suppose. On changing flush/regionsize, you'd rather have us split faster then slow as count of regions goes up. Ok. Make decision to split based on aggregate size of all StoreFiles and revisit related config params -- Key: HBASE-2375 URL: https://issues.apache.org/jira/browse/HBASE-2375 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.20.3 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Labels: moved_from_0_20_5 Attachments: HBASE-2375-v8.patch Currently we will make the decision to split a region when a single StoreFile in a single family exceeds the maximum region size. This issue is about changing the decision to split to be based on the aggregate size of all StoreFiles in a single family (but still not aggregating across families). This would move a check to split after flushes rather than after compactions. This issue should also deal with revisiting our default values for some related configuration parameters. The motivating factor for this change comes from watching the behavior of RegionServers during heavy write scenarios. Today the default behavior goes like this: - We fill up regions, and as long as you are not under global RS heap pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) StoreFiles. - After we get 3 StoreFiles (hbase.hstore.compactionThreshold) we trigger a compaction on this region. - Compaction queues notwithstanding, this will create a 192MB file, not triggering a split based on max region size (hbase.hregion.max.filesize). - You'll then flush two more 64MB MemStores and hit the compactionThreshold and trigger a compaction. - You end up with 192 + 64 + 64 in a single compaction. This will create a single 320MB and will trigger a split. - While you are performing the compaction (which now writes out 64MB more than the split size, so is about 5X slower than the time it takes to do a single flush), you are still taking on additional writes into MemStore. - Compaction finishes, decision to split is made, region is closed. The region now has to flush whichever edits made it to MemStore while the compaction ran. This flushing, in our tests, is by far the dominating factor in how long data is unavailable during a split. We measured about 1 second to do the region closing, master assignment, reopening. Flushing could take 5-6 seconds, during which time the region is unavailable. - The daughter regions re-open on the same RS. Immediately when the StoreFiles are opened, a compaction is triggered across all of their StoreFiles because they contain references. Since we cannot currently split a split, we need to not hang on to these references for long. This described behavior is really bad because of how often we have to rewrite data onto HDFS. Imports are usually just IO bound as the RS waits to flush and compact. In the above example, the first cell to be inserted into this region ends up being written to HDFS 4 times (initial flush, first compaction w/ no split decision, second compaction w/ split decision, third compaction on daughter region). In addition, we leave a large window where we take on edits (during the second compaction of 320MB) and then must make the region unavailable as we flush it. If we increased the compactionThreshold to be 5 and determined splits based on aggregate size, the behavior becomes: - We fill up regions, and as long as you are not under global RS heap pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) StoreFiles. - After each MemStore flush, we calculate the aggregate size of all StoreFiles. We can also check the compactionThreshold. For the first three flushes, both would not hit the limit. On the fourth flush, we would see total aggregate size = 256MB and determine to make a split. - Decision to split is made, region is closed. This time, the region just has to flush out whichever edits made it to the MemStore during the snapshot/flush of the previous MemStore. So this time window has shrunk by more than 75% as it was the time to write 64MB from memory not 320MB from aggregating 5 hdfs files. This
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204852#comment-13204852 ] stack commented on HBASE-5270: -- bq. So when assigning regions, some regionplans whose destination is the dead server will be failed. True. They'll fail and get retried elsewhere which shouldn't be too bad. But need to keep it in mind. This all doesn't work (I think) if we need to scan meta and its on a server that has just gone down. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205139#comment-13205139 ] stack commented on HBASE-5325: -- Ok. I can go along w/ cutting the scope of this issue back to its original intent (smile -- sorry for the hairbraining). I was going to make a comment that we have the jmx 'model' not be a completely new one -- that it instead pick the existing model (though admittedly its a bit messy) -- but it looks like your hand has been forced some already. I was going to suggest that we name the mbean for metrics MasterMetrics rather than MasterStatistics but I see this an existing MBean (Who named our mbean 'Master' and 'MasterStatistics' -- our history doesn't say... they don't seem like good names... why not org.apache.hbase prefix... etc.) Why is HBaseMasterMXBean in hbase.metrics and not in hbase.master.metrics? Ditto HbaseRegionServerMXBean You don't need these in your new files: + * Copyright 2012 The Apache Software Foundation Why not name the bean for the master servername especially as there can be multiple masters running in a cluster -- getServerName. Why would you have this info in regionserver jmx attributes: + public String getHBaseMaster(); This is pretty useless: + public long getStartCode(); Better return the regionserver ServerName. Or name the mbean for the ServerName. More to follow... Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205215#comment-13205215 ] stack commented on HBASE-5325: -- I suppose you can't name the bean for the ServerName because you need to be able to locate the bean -- i.e. there'll be N instances of RegionServer beans and you'd then figure which belonged to which by looking at the ServerName attribute (I remember this is how it worked now). Is this in branch-0.20-append branch: +import org.apache.hadoop.metrics2.util.MBeans; If not, this breaks our ability to run on that branch (Up to this presumed we could run there). Its in 1.0.0 hadoop? (would need to check its in CDH..) Do classes have to have an HBase (or Hbase) prefix? Seems redundant (and we should be consistent). Can we have a better name than HBaseRegionServerInfo. It says nothing. If it was RegionServerMBean or MXBean, it'd say more about what this class is about. You can all it 'instance'. 'theInstance' is too much? And again, publishing master, ensemble, and startcode seems like a bunch of info you'd never act on. Master maybe -- you'd know which master it was talking too -- and perhaps ensemble because then you know who its registered with (though having both seems unnessary... the ensemble with its rootdir will tell you which cluster we belong too... perhaps you should get the cluster id out here?) but startcode is no good to anyone really. Should be ServerName coming out here. Thats how we uniquely identify regionservers in fs, when they report into the master and up in ensemble. Might as well continue the identifier here. Does this class need to take a RegionServer implementation? Can it take a o.a.h.h.Server and/or a o.a.h.h.regionserver.RegionServerServices publishing jmx attributes? These are Interfaces. Might make this all easier to write tests on. HbaseRegionServerMXBean should be in the regionserver/metrics package then you could call it MXBean or ServerMXBean. HBaseMasterMXBean should be min master/metrics, should be HBaseMasterMXBean. The RegionServerInfo in it should be showing more than this small set of metrics if you are going to the bother of putting this stuff out on jmx (we do requests per region now -- should there be one of these classes per region?) I think that you don't need startcode if getRegionServer is returning the ServerName as a String. Don't RegionState up in RegionsInTransiation have a ServerName associated too? YOu're not publishing this? Again, ain't these bad names for beans up in jmx? +mxBean = MBeans.register(HBase, MasterInfo, +mxBeanInfo); shouldn't it be org.apache.hbase... or hbase if thats the parent for all of the hbase beans (Ain't there convention on bean naming -- IIRC). MasterInfo says nothing. Could it be just Master. Then you do getServerName or what ever it is that returns ServerName to distingush this master from the backup Master? Sorry for so many comments for such a small patch. I just feel that this stuff can be really useful if its done right. Else its just more stuff for us to maintain. Thanks for doing this stuff lads. Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5364) Fix source files missing licenses in 0.92 and trunk
[ https://issues.apache.org/jira/browse/HBASE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205685#comment-13205685 ] stack commented on HBASE-5364: -- You fellas going to commit? Fix source files missing licenses in 0.92 and trunk --- Key: HBASE-5364 URL: https://issues.apache.org/jira/browse/HBASE-5364 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.0 Reporter: Jonathan Hsieh Assignee: Elliott Clark Priority: Blocker Attachments: HBASE-5364-1.patch, hbase-5364-0.92.patch running 'mvn rat:check' shows that a few files have snuck in that do not have proper apache licenses. Ideally we should fix these before we cut another release/release candidate. This is a blocker for 0.94, and probably should be for the other branches as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5368) Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible in HBase installs
[ https://issues.apache.org/jira/browse/HBASE-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205686#comment-13205686 ] stack commented on HBASE-5368: -- +1 Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible in HBase installs - Key: HBASE-5368 URL: https://issues.apache.org/jira/browse/HBASE-5368 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Attachments: 5368.txt Very simple change to make PrefixSplitKeyPolicy accessible in HBase installs (user still needs to setup the table(s) accordingly). Right now it is in src/test/org.apache.hadoop.hbase.regionserver, I propose moving it to src/org.apache.hadoop.hbase.regionserver (alongside ConstantSizeRegionSplitPolicy), and maybe renaming it too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5368) Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible in HBase installs
[ https://issues.apache.org/jira/browse/HBASE-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205687#comment-13205687 ] stack commented on HBASE-5368: -- +1 Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible in HBase installs - Key: HBASE-5368 URL: https://issues.apache.org/jira/browse/HBASE-5368 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Attachments: 5368.txt Very simple change to make PrefixSplitKeyPolicy accessible in HBase installs (user still needs to setup the table(s) accordingly). Right now it is in src/test/org.apache.hadoop.hbase.regionserver, I propose moving it to src/org.apache.hadoop.hbase.regionserver (alongside ConstantSizeRegionSplitPolicy), and maybe renaming it too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5377) Fix licenses on the 0.90 branch.
[ https://issues.apache.org/jira/browse/HBASE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205784#comment-13205784 ] stack commented on HBASE-5377: -- Why this: {code} /plugin +plugin + artifactIdmaven-surefire-report-plugin/artifactId + version2.9/version +/plugin +plugin + groupIdorg.apache.avro/groupId + artifactIdavro-maven-plugin/artifactId + version${avro.version}/version +/plugin +plugin + groupIdorg.codehaus.mojo/groupId + artifactIdbuild-helper-maven-plugin/artifactId + version1.5/version +/plugin {code} Else patch looks good to me. If you can build site and the webapps work, commit I'd say. Fix licenses on the 0.90 branch. Key: HBASE-5377 URL: https://issues.apache.org/jira/browse/HBASE-5377 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5377.patch There are a handful of empty files and several files missing apache licenses on the 0.90 branch. This patch will fixes all of them and in conjunction with HBASE-5363 will allow it to pass RAT tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5363) Automatically run rat check on mvn release builds
[ https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205887#comment-13205887 ] stack commented on HBASE-5363: -- Whats your hadoop wikiid Jon? Automatically run rat check on mvn release builds - Key: HBASE-5363 URL: https://issues.apache.org/jira/browse/HBASE-5363 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.90.5, 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5363-0.90.patch, hbase-5363.2.patch, hbase-5363.patch Some of the recent hbase release failed rat checks (mvn rat:check). We should add checks likely in the mvn package phase so that this becomes a non-issue in the future. Here's an example from Whirr: https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression
[ https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205895#comment-13205895 ] stack commented on HBASE-5313: -- How do I read the above? Its same amount of kvs in each of the files? Restructure hfiles layout for better compression Key: HBASE-5313 URL: https://issues.apache.org/jira/browse/HBASE-5313 Project: HBase Issue Type: Improvement Components: io Reporter: dhruba borthakur Assignee: dhruba borthakur A HFile block contain a stream of key-values. Can we can organize these kvs on the disk in a better way so that we get much greater compression ratios? One option (thanks Prakash) is to store all the keys in the beginning of the block (let's call this the key-section) and then store all their corresponding values towards the end of the block. This will allow us to not-even decompress the values when we are scanning and skipping over rows in the block. Any other ideas? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup
[ https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205936#comment-13205936 ] stack commented on HBASE-5209: -- bq. I have a multi-master HBase set up, and I'm trying to programmatically determine which of the masters is currently active. But the API does not allow me to do this. There is a getMaster() method in the HConnection class, but it returns an HMasterInterface, whose methods do not allow me to find out which master won the last race. The API should have a getActiveMasterHostname() or something to that effect. If you do a getMaster, I'd think that you should get the active master, only, in HConnection. Are you saying that it'll give you an Interface on the non-active Master? Thats broke I'd say. For the name of the Master, yeah, getServerName should be part of HMasterInterface. On the patch: {code} + private boolean isMasterRunning, isActiveMaster; {code} The above are the names of methods, not data members. Should be masterRunning and activeMaster. Whats going on here: {code} +this.master = master; +this.isMasterRunning = isMasterRunning; +this.isActiveMaster = isActiveMaster; {code} So, we could be reporting a master that is not running and not the active master? Why would we even care about it in that case? getMasterInfo as method name returning master ServerName seems off. Is this the 'active' master or non-running master? I think we need to be clear that ClusterStatus reports on the active master only (unless you want to add list of all running master which I don't think yet possible since they do not register until they assume mastership --- hmmm... looking further down in your patch, it looks like you are adding this facility to zk). Is this of any use? + public boolean isMasterRunning() { I mean, if master is not running, can you even get a ClusterStatus from the cluster? Ditto for + public boolean isActiveMaster() { Won't this just be true anytime you get a ClusterStatue? You up the ClusterStatue version number but you don't act on it (what if you are asked deserialize an earlier version of ClusterStatus?) On MasterInterface, I'd suggest don't bother upping the version number -- just add the new method on the end. Thats usually ok. Also, isActiveMaster of any use even? (You could ask zk directly? Have hbaseadmin go ask zk rather than go via the master at all? Isn't the master znode name its ServerName? Isn't that what you need?) I like your registering backup masters... and adding the list to the zk report. HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup Key: HBASE-5209 URL: https://issues.apache.org/jira/browse/HBASE-5209 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.94.0, 0.90.5, 0.92.0 Reporter: Aditya Acharya Assignee: David S. Wang Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff I have a multi-master HBase set up, and I'm trying to programmatically determine which of the masters is currently active. But the API does not allow me to do this. There is a getMaster() method in the HConnection class, but it returns an HMasterInterface, whose methods do not allow me to find out which master won the last race. The API should have a getActiveMasterHostname() or something to that effect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5363) Automatically run rat check on mvn release builds
[ https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205937#comment-13205937 ] stack commented on HBASE-5363: -- Try it now boss Automatically run rat check on mvn release builds - Key: HBASE-5363 URL: https://issues.apache.org/jira/browse/HBASE-5363 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.90.5, 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5363-0.90.patch, hbase-5363.2.patch, hbase-5363.patch Some of the recent hbase release failed rat checks (mvn rat:check). We should add checks likely in the mvn package phase so that this becomes a non-issue in the future. Here's an example from Whirr: https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer
[ https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206026#comment-13206026 ] stack commented on HBASE-5387: -- Any reason for hardcoding of 32K for buffer size: + ((Configurable)codec).getConf().setInt(io.file.buffer.size, 32 * 1024); Give this an initial reasonable size? +compressedByteStream = new ByteArrayOutputStream(); So, we'll keep around the largest thing we ever wrote into this ByteArrayOutputStream? Should we resize it or something from time to time? Or I suppose we can just wait till its a prob? Is the gzip stuff brittle? The header can be bigger than 10bytes I suppose (spec allows extensions IIRC) but I suppose its safe because we presume java or underlying native compression. Good stuff Mikhail. +1 on patch. Reuse compression streams in HFileBlock.Writer -- Key: HBASE-5387 URL: https://issues.apache.org/jira/browse/HBASE-5387 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure every time we create a compression stream. The native data structure is only deallocated in the finalizer. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer
[ https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206156#comment-13206156 ] stack commented on HBASE-5387: -- IIRC, the BAOS will keep the outline of the largest allocation that went through it -- reset doesn't put the BAOS backing buffer back to original size... I haven't looked at the src... maybe its better now (I wrote that crawler gzipping thing you cite above). Reuse compression streams in HFileBlock.Writer -- Key: HBASE-5387 URL: https://issues.apache.org/jira/browse/HBASE-5387 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical Fix For: 0.94.0 Attachments: D1719.1.patch, Fix-deflater-leak-2012-02-10_18_48_45.patch We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure every time we create a compression stream. The native data structure is only deallocated in the finalizer. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call
[ https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206269#comment-13206269 ] stack commented on HBASE-3584: -- Looks like HBASE-3967 RowMutation is a common denominator of Put and Delete where as this RowMutation is a container that holds a swathe of Puts and Deletes to apply to a single row; they are different. I'm up for rename. This RowMutation becomes RowMutations? Allow atomic put/delete in one call --- Key: HBASE-3584 URL: https://issues.apache.org/jira/browse/HBASE-3584 Project: HBase Issue Type: New Feature Components: client, coprocessors, regionserver Reporter: ryan rawson Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt Right now we have the following calls: put(Put) delete(Delete) increment(Increments) But we cannot combine all of the above in a single call, complete with a single row lock. It would be nice to do that. It would also allow us to do a CAS where we could do a put/increment if the check succeeded. - Amendment: Since Increment does not currently support MVCC it cannot be included in an atomic operation. So this for Put and Delete only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer
[ https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206271#comment-13206271 ] stack commented on HBASE-5387: -- @Ted Yeah, I took a look at BAOS. Reset just sets count to zero as you see. If a BAOS goes big and stays big, maybe its not so bad. If there are a bunch of these though, it could become a prob. We could do as you suggest as long as we do it before we call resize? So we'd have a conditional where if the size N times the buffer, then instead of resusing, we go get a new BAOS? Reuse compression streams in HFileBlock.Writer -- Key: HBASE-5387 URL: https://issues.apache.org/jira/browse/HBASE-5387 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical Fix For: 0.94.0 Attachments: D1719.1.patch, Fix-deflater-leak-2012-02-10_18_48_45.patch We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure every time we create a compression stream. The native data structure is only deallocated in the finalizer. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4920) We need a mascot, a totem
[ https://issues.apache.org/jira/browse/HBASE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206548#comment-13206548 ] stack commented on HBASE-4920: -- The red one is not bad (purple too cartoony and the other too complicated). On Marcy's latest, I think first set while graphically clean, they have lost too much detail and hard now to identify as an orca in particular. Option 5 is not bad but not sure this an orca either w/ the blunt nose. I'd say option 6 the best of the lot. Looking around more it looks like we could 'buy' these as is: http://www.shutterstock.com/pic-58643317/stock-vector-vector-killer-whale-represented-in-the-form-of-a-tattoo.html http://depositphotos.com/2900573/stock-illustration-Killer-whale-tattoo.html http://www.canstockphoto.com/whale-tattoo-3058609.html http://www.canstockphoto.com/orca-killer-whale-2297275.html http://4.bp.blogspot.com/_HBjr0PcdZW4/TIge5Uok9eI/A8Q/0AdMNtuhRaY/s400/pm006-orca7.png http://www.bigstockphoto.com/image-27015218/stock-vector-orca-killer-whale-cartoon-vector-illustratio http://www.graphicsfactory.com/Clip_Art/Animals/Fish/funny_water_animals_053_377466.html This one, the second, inverted might work http://depositphotos.com/2900573/stock-illustration-Killer-whale-tattoo.html, so its rising over the hbase logo from the right. We need a mascot, a totem - Key: HBASE-4920 URL: https://issues.apache.org/jira/browse/HBASE-4920 Project: HBase Issue Type: Task Reporter: stack Attachments: HBase Orca Logo.jpg, Orca_479990801.jpg, Screen shot 2011-11-30 at 4.06.17 PM.png, apache hbase orca logo_Proof 3.pdf, apache logo_Proof 8.pdf, krake.zip, photo (2).JPG We need a totem for our t-shirt that is yet to be printed. O'Reilly owns the Clyesdale. We need something else. We could have a fluffy little duck that quacks 'hbase!' when you squeeze it and we could order boxes of them from some off-shore sweatshop that subcontracts to a contractor who employs child labor only. Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from Salesforce showed me, that was a bit too spiritual for me to be seen quoting here, it had the Orca as the 'Guardian of the Cosmic Memory': i.e. in translation, bigdata). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207017#comment-13207017 ] stack commented on HBASE-5200: -- Can we do a test for this before it goes into 0.92 and trunk? AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-v2.txt, HBASE-5200.patch, HBASE-5200_1.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5388) Tuning HConnectionManager#getCachedLocation method
[ https://issues.apache.org/jira/browse/HBASE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207031#comment-13207031 ] stack commented on HBASE-5388: -- I agree w/ Zhihong, that '+if (!(this.internalMap instanceof NavigableMap)) {' block seems unnecessary. Is the 'greatest' in this right? {code} + * retrieves the value associated with the greatest key strictly less than + * the given key, or null if there is no such key {code} Its unfortunate that SoftValueSortedMap sneaks back into HCM but I can live with it if no alternative (Lars' suggestion having SVSM implement NM didn't seem to bad but understand if it adds a bunch of boiler plate) Tuning HConnectionManager#getCachedLocation method -- Key: HBASE-5388 URL: https://issues.apache.org/jira/browse/HBASE-5388 Project: HBase Issue Type: Improvement Affects Versions: 0.90.5 Reporter: ronghai.ma Assignee: ronghai.ma Labels: patch Fix For: 0.94.0 Attachments: 5388-v2.txt, HConnectionManager.java, SoftValueSortedMap.java, SoftValueSortedMap.java About 75% improvement in execution time. 1. Add the following method in SoftValueSortedMap: {code} public synchronized K, V EntryK, V lowerEntry(K key) { return ((TreeMap) this.internalMap).lowerEntry(key); } {code} 2. Modify getCachedLocation: {code} Map.Entrybyte[], HRegionLocation tEntry = tableLocations.lowerEntry(row); if (tEntry != null) { HRegionLocation possibleRegion = tEntry.getValue(); //other code } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call
[ https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207045#comment-13207045 ] stack commented on HBASE-3584: -- This is an unfortunate situation -- e.g. Mutation is in 0.92 release and its what RowMutation is in 0.89fb IIUC -- but lets patch up the model as best we can with deprecations of the ugly so we're clear on what we want folks to use going forward (I believe I reviewed both patches -- I should have caught the name clash). Allow atomic put/delete in one call --- Key: HBASE-3584 URL: https://issues.apache.org/jira/browse/HBASE-3584 Project: HBase Issue Type: New Feature Components: client, coprocessors, regionserver Reporter: ryan rawson Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt Right now we have the following calls: put(Put) delete(Delete) increment(Increments) But we cannot combine all of the above in a single call, complete with a single row lock. It would be nice to do that. It would also allow us to do a CAS where we could do a put/increment if the check succeeded. - Amendment: Since Increment does not currently support MVCC it cannot be included in an atomic operation. So this for Put and Delete only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5388) Tuning HConnectionManager#getCachedLocation method
[ https://issues.apache.org/jira/browse/HBASE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207051#comment-13207051 ] stack commented on HBASE-5388: -- bq. The javadoc involving 'greatest' comes from javadoc for lowerEntry(). Ok v3 is good by me except for the HCM pollution w/ SoftValueSortedMap. I looked at making it implement NavigableMap and its 18 extra methods. They could all throw unsupported as per Lars. I'm fine w/ this patch though. Tuning HConnectionManager#getCachedLocation method -- Key: HBASE-5388 URL: https://issues.apache.org/jira/browse/HBASE-5388 Project: HBase Issue Type: Improvement Affects Versions: 0.90.5 Reporter: ronghai.ma Assignee: ronghai.ma Labels: patch Fix For: 0.94.0 Attachments: 5388-v2.txt, 5388-v3.txt, HConnectionManager.java, SoftValueSortedMap.java, SoftValueSortedMap.java About 75% improvement in execution time. 1. Add the following method in SoftValueSortedMap: {code} public synchronized K, V EntryK, V lowerEntry(K key) { return ((TreeMap) this.internalMap).lowerEntry(key); } {code} 2. Modify getCachedLocation: {code} Map.Entrybyte[], HRegionLocation tEntry = tableLocations.lowerEntry(row); if (tEntry != null) { HRegionLocation possibleRegion = tEntry.getValue(); //other code } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2375) Make decision to split based on aggregate size of all StoreFiles and revisit related config params
[ https://issues.apache.org/jira/browse/HBASE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207063#comment-13207063 ] stack commented on HBASE-2375: -- +1 on patch Make new issue to change compactionThreshod to at least 4 I'd say and then you can close out this one? Make decision to split based on aggregate size of all StoreFiles and revisit related config params -- Key: HBASE-2375 URL: https://issues.apache.org/jira/browse/HBASE-2375 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.20.3 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Labels: moved_from_0_20_5 Attachments: HBASE-2375-flush-split.patch, HBASE-2375-v8.patch Currently we will make the decision to split a region when a single StoreFile in a single family exceeds the maximum region size. This issue is about changing the decision to split to be based on the aggregate size of all StoreFiles in a single family (but still not aggregating across families). This would move a check to split after flushes rather than after compactions. This issue should also deal with revisiting our default values for some related configuration parameters. The motivating factor for this change comes from watching the behavior of RegionServers during heavy write scenarios. Today the default behavior goes like this: - We fill up regions, and as long as you are not under global RS heap pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) StoreFiles. - After we get 3 StoreFiles (hbase.hstore.compactionThreshold) we trigger a compaction on this region. - Compaction queues notwithstanding, this will create a 192MB file, not triggering a split based on max region size (hbase.hregion.max.filesize). - You'll then flush two more 64MB MemStores and hit the compactionThreshold and trigger a compaction. - You end up with 192 + 64 + 64 in a single compaction. This will create a single 320MB and will trigger a split. - While you are performing the compaction (which now writes out 64MB more than the split size, so is about 5X slower than the time it takes to do a single flush), you are still taking on additional writes into MemStore. - Compaction finishes, decision to split is made, region is closed. The region now has to flush whichever edits made it to MemStore while the compaction ran. This flushing, in our tests, is by far the dominating factor in how long data is unavailable during a split. We measured about 1 second to do the region closing, master assignment, reopening. Flushing could take 5-6 seconds, during which time the region is unavailable. - The daughter regions re-open on the same RS. Immediately when the StoreFiles are opened, a compaction is triggered across all of their StoreFiles because they contain references. Since we cannot currently split a split, we need to not hang on to these references for long. This described behavior is really bad because of how often we have to rewrite data onto HDFS. Imports are usually just IO bound as the RS waits to flush and compact. In the above example, the first cell to be inserted into this region ends up being written to HDFS 4 times (initial flush, first compaction w/ no split decision, second compaction w/ split decision, third compaction on daughter region). In addition, we leave a large window where we take on edits (during the second compaction of 320MB) and then must make the region unavailable as we flush it. If we increased the compactionThreshold to be 5 and determined splits based on aggregate size, the behavior becomes: - We fill up regions, and as long as you are not under global RS heap pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) StoreFiles. - After each MemStore flush, we calculate the aggregate size of all StoreFiles. We can also check the compactionThreshold. For the first three flushes, both would not hit the limit. On the fourth flush, we would see total aggregate size = 256MB and determine to make a split. - Decision to split is made, region is closed. This time, the region just has to flush out whichever edits made it to the MemStore during the snapshot/flush of the previous MemStore. So this time window has shrunk by more than 75% as it was the time to write 64MB from memory not 320MB from aggregating 5 hdfs files. This will greatly reduce the time data is unavailable during splits. - The daughter regions re-open on the same RS. Immediately when the StoreFiles are opened, a compaction is triggered across all of their StoreFiles because they contain references.
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207080#comment-13207080 ] stack commented on HBASE-5325: -- Does this need to be public? Can it be protected in AM? {code} +public ServerName getServerName() { {code} Is the above even used? Please file an issue to move +import org.apache.hadoop.metrics.util.MBeanUtil; to o.a.h.h.u after this patch goes in (Seems like a silly place for this class -- not your fault I know) I think this is better for a name: {code} +MBeanUtil.registerMBean(org.apache.hadoop.hbase, Master, mxBeanInfo); {code} ... maybe o.a.hbase instead of o.a.h.h (the o.a.hadoop prefix is legacy we'll undo one day). This bean is new, right? So its ok giving it a name like this? Needs a class comment: '+public class MasterMXBeanImpl implements MasterMXBean {' Do you think we need to add an isActiveMaster attribute? What if the master is not active? Will it show in jmx? If not, then no need of such an attribute. Do you think this Interface belongs in Metrics? src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMXBean.java Its more than just metrics, right? Maybe should be in master package (Sorry, should have said that in previous review) Why not call it RegionServerMXBeanImpl.java or just MXBeanImpl (here and in master package)? I see now why no class comment on the Impl; its because you have class comment on the Interface. Thats fine. Do you have to do this getHBaseMaster with all of the messing to construct a master name (and its not right anyways since master has its service port, not its UI port when its name is made Leave it out I'd say... users can get the info in zk. Good stuff Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5393) Consider splitting after flushing
[ https://issues.apache.org/jira/browse/HBASE-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207081#comment-13207081 ] stack commented on HBASE-5393: -- +1 on putting in 0.92 too... Consider splitting after flushing - Key: HBASE-5393 URL: https://issues.apache.org/jira/browse/HBASE-5393 Project: HBase Issue Type: Improvement Affects Versions: 0.90.5 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.94.0 Attachments: HBASE-2375-flush-split.patch Spawning this from HBASE-2375, I saw that it was much more efficient compaction-wise to check if we can split right after flushing. Much like the ideas that Jon spelled out in the description of that jira, the window is smaller because you don't have to compact and then split right away to only compact again when the daughters open. Another thing it improves is while we're normally waiting for the compaction to happen, data that's still coming in will make us go way past the MAX_FILESIZE to a point where for the first region I was seeing a store size 3-4x bigger before it was able to split. I targeted this for 0.94, but I'd like to get this into 0.92.1 or .2 too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call
[ https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207086#comment-13207086 ] stack commented on HBASE-3584: -- bq. Let's just rename trunk's RowMutation to RowMutations. Sure, but don't we want to also deprecate RowMutation when it goes in and point people to instead use Mutation? Allow atomic put/delete in one call --- Key: HBASE-3584 URL: https://issues.apache.org/jira/browse/HBASE-3584 Project: HBase Issue Type: New Feature Components: client, coprocessors, regionserver Reporter: ryan rawson Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt Right now we have the following calls: put(Put) delete(Delete) increment(Increments) But we cannot combine all of the above in a single call, complete with a single row lock. It would be nice to do that. It would also allow us to do a CAS where we could do a put/increment if the check succeeded. - Amendment: Since Increment does not currently support MVCC it cannot be included in an atomic operation. So this for Put and Delete only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207340#comment-13207340 ] stack commented on HBASE-5200: -- Attached unit test stands up an AssignmentManager and then manufactures the condition that Ram describes. The test gets stuck and timesout after five seconds because the znode is not cleared on master failover (as per Ram description). Ram, your patch no longer applies to TRUNK seemingly. Why you make a hash w/ preset size of 1? {code} + private SetString regionsProcessed = new HashSetString(1); {code} Is this the right name for this hash? Should it be regionsProcessedJoiningCluster or some such? The regionsProcessed hash is of a String. I see in handleRegionWhileFailOverInProgress that we always get the regioninfo from meta. Isn't possible that in processRegionInTransition we may have done this already? That it may be non-null? If so, shouldn't we keep it around so we don't have to go to the .META. every time but only for those cases where regioninfo is indeed null? Would that mean changing regionsProcessed to be a Map of String to HRI? Isn't getHRegionInfo repeating code from earlier up in processRegionInTransition? If so, change it so that there is only one place where we go to meta... have both places call your new getRegionInfo method. Why do this: {code} + hri = p.getFirst(); + return hri; {code} Why not just do return p.getFirst();? Is everything shifted right because of this test? {code} + if (regionState == null + !regionsProcessed.contains(encodedRegionName)) { {code} If so, shouldn't we just take the opposite of the above and return immediately if regionState is non-null and in regionsProcesed as in: {code} if (regionsState != null regionsProcessed.contains(encodedRegionName)) return; {code} This would make your change less substantial. It seems wrong that we are putting stuff into RIT in two places; in processRegionsInTransition and in handlRegion if we happen to be fielding a call back before failover has had a chance to run. Would the fb trick of NOT processing callbacks during master failover help here? At least for the scope of the AM.joinCluster? Is this a good name for this method? handleRegionWhileFailOverInProgress Should it be checkFailover or some such? The test I attached only checks the CLOSING state. We should extend it to do the other states OPENING, etc.? I can help with this. Also, how did you figure out this bug. It must have taken a bunch of head banging to figure that this was indeed what was going on. Good stuff Ram. AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, HBASE-5200_1.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207386#comment-13207386 ] stack commented on HBASE-5325: -- bq. This was added to RegionState to expose the private field ServerName for display with regions in transition. Ok. I thought it was on AM. My bad. bq. Will replace with the zk quorum info if that seems appropriate. Yeah, that might be more appropriate Good stuff. Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207402#comment-13207402 ] stack commented on HBASE-5325: -- +1 Will let it stew some in case others want to take a looksee. Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, HBASE-5325.3.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207403#comment-13207403 ] stack commented on HBASE-5200: -- bq. However, TestAssignmentManager#testBalanceOnMasterFailover fails with or without the patch. Then the patch doesn't fix the issue? AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, HBASE-5200_1.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207842#comment-13207842 ] stack commented on HBASE-5200: -- bq. - The region was not transitioned after the CLOSED transition got a call back for assigning it. So there was no RS to process the assign. I did not think this needed since I'd reproduced your scenario. Let me look at your changes. Thanks. AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5398) HBase shell disable_all/enable_all/drop_all promp wrong tables for confirmation
[ https://issues.apache.org/jira/browse/HBASE-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207899#comment-13207899 ] stack commented on HBASE-5398: -- Patch looks good. How is disable_all supposed to work? Its supposed to take a regex pattern? HBase shell disable_all/enable_all/drop_all promp wrong tables for confirmation --- Key: HBASE-5398 URL: https://issues.apache.org/jira/browse/HBASE-5398 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.94.0, 0.92.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.94.0, 0.92.0 Attachments: hbase-5398.patch When using hbase shell to disable_all/enable_all/drop_all tables, the tables prompted for confirmation are wrong. For example, disable_all 'test*' will ask form confirmation to diable tables like: mytest1 test123 Fortunately, these tables will not be disabled actually since Java pattern doesn't match this way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call
[ https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207905#comment-13207905 ] stack commented on HBASE-3584: -- @Kannan Yes. Sounds good. Was saying that the forwarded ported RowMutation should be deprecated on forward-port since it does what Mutation is in TRUNK -- but we can that in another, subsequent patch. Allow atomic put/delete in one call --- Key: HBASE-3584 URL: https://issues.apache.org/jira/browse/HBASE-3584 Project: HBase Issue Type: New Feature Components: client, coprocessors, regionserver Reporter: ryan rawson Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt Right now we have the following calls: put(Put) delete(Delete) increment(Increments) But we cannot combine all of the above in a single call, complete with a single row lock. It would be nice to do that. It would also allow us to do a CAS where we could do a put/increment if the check succeeded. - Amendment: Since Increment does not currently support MVCC it cannot be included in an atomic operation. So this for Put and Delete only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208032#comment-13208032 ] stack commented on HBASE-5399: -- Hurray on below: {code} + @Deprecated public ZooKeeperWatcher getZooKeeperWatcher() throws IOException; ... + * @deprecated Removed because it was a mistake exposing zookeeper in this + * interface (ZooKeeper is an implementation detail). */ + @Deprecated public HMasterInterface getMaster() {code} Why are these deprecated? (Add why to the deprecated note -- add pointer to where user can get functionality elsewise): {code} + @Deprecated public HRegionInterface getHRegionConnection(HServerAddress regionServer) {code} Fix your comments in HCM. Missing 'e' on 'not' and 'd' on 'use' Do we get the clusterid on connection setup? Do we have to? Can we just get that when someone asks for it? Fix... {code} // We will ope/close a ZooKeeper {code} What about isTableEnabled, etc., should they be deprecated, moved out of HConnnection? Or that is for a different issue? Should we be using straight ZooKeeper instead of ZooKeeperWatcher? We don't need watch facility? So far so good... Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208040#comment-13208040 ] stack commented on HBASE-5200: -- bq. How should HBASE-5270 be solved using the above approach ? There would be no concurrent servershutdownhandler running. For the specialization, hbase-4748, in the current version hbase-5344 there'd be no need to get .META. on line to complete failover (but it looks like Mikhail is revisiting this aspect in his last comments up on hbase-5344). AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208061#comment-13208061 ] stack commented on HBASE-5399: -- bq. We can do that, it requires some work to do it well (ZooKeeperWatcher has spread too much, and has a lot of responsibilities (for example, it's the owner of the znode names, created from the config parameters)). It would be much cleaner, and a little bit faster. We would still pay for the tcp connection however. Ok. For another patch then I'd say (Agree ZKW is like a dumping ground for zk ops) bq. isTableEnabled, etc., should they be deprecated, moved out of HConnnection I was thinking putting them in HBaseAdmin, does it makes sense? I think it makes sense... deprecate them in HCM and move to HBaseAdmin Sweet. Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5400) Some tests does not have annotations for (Small|Medium|Large)Tests
[ https://issues.apache.org/jira/browse/HBASE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208140#comment-13208140 ] stack commented on HBASE-5400: -- +1 Some tests does not have annotations for (Small|Medium|Large)Tests --- Key: HBASE-5400 URL: https://issues.apache.org/jira/browse/HBASE-5400 Project: HBase Issue Type: Bug Components: security, test Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5400_v1.patch These tests does not have annotations, and are not picked up by -PrunAllTests {code} security/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessControlFilter.java security/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java security/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java security/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java security/src/test/java/org/apache/hadoop/hbase/security/token/TestTokenAuthentication.java security/src/test/java/org/apache/hadoop/hbase/security/token/TestZKSecretWatcher.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java {code} We can also backport this to 0.92.1, since development will continue on 0.92 branch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208173#comment-13208173 ] stack commented on HBASE-5200: -- You are right Ted. AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208183#comment-13208183 ] stack commented on HBASE-5200: -- bq. I think the solution provided by Ramkrishna should be integrated. Into TRUNK? Its an improvement but I feel that there are loads of holes in here trying to process callbacks at the same time as trying to bring the new master online w/ a coherent picture of cluster state; it strikes me as a task w/o end -- hard to test too (witness the test added here). We need a refactor of master failover. Holding up all callback processing strikes me as a basic simplification that we should take on. AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208230#comment-13208230 ] stack commented on HBASE-5200: -- Make a new issue Ram since we've already committed a patch to 0.90 on this issue? AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208238#comment-13208238 ] stack commented on HBASE-5200: -- Sorry. I misread hadoopqa output. AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208239#comment-13208239 ] stack commented on HBASE-5396: -- On the below: {code} + public boolean isRegionOnline(HRegionInfo hri) { +HServerInfo hsi = this.regions.get(hri); +if (hsi != null this.isServerOnline(hsi.getServerName())) { + return true; +} +return false; + } {code} Don't you have to take out a lock on this.regions before you access it? See the comment in this.servers. Also, could write the end of the method so: {code} return hsi != null this.isServerOnline(hsi.getServerName(); {code} Whats a RegionsWithDeadServer? Is it RegionsOnDeadServers? The below is called regionplan but its storing HRegionInfos? {code} +SetHRegionInfo regionPlanOnThisServer = new HashSetHRegionInfo(); {code} And then here, we are storing a Set of HRIs but method name talks of RegionPlans. Its a little hard to follow? Ditto here: {code} +private SetHRegionInfo regionPlanOnThisServer = null; {code} and this... {code} +public SetHRegionInfo getRegionPlanOnThisServer() { {code} This comment doesn't seem right? {code} + * Process result used by processServerShutdown. {code} There is no processing done in this data structure. ... and save a few lines? Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.7 Attachments: HBASE-5396-90.patch The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208655#comment-13208655 ] stack commented on HBASE-5200: -- It would make sense that it get committed to 0.92 also. AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208803#comment-13208803 ] stack commented on HBASE-5270: -- @Chunhui Excellent. I am all good w/ modifying core to make it more testable. Would suggest instead though that you make more generic changes up in Master, etc. For example, the attached patch makes it so a subclass of HMaster can observe log splitting and insert pauses and override the RegionServerTracker to pause on deletes until a gate is cleared. This might make more sense than the custom changes made to RegionServerTracker and HMaster in this patch. On the patch, I think the HMaster and RegionServerTracker changes are too specific to this test case. Would suggest making more generic changes to these core classes because other tests will be able to make use of a more generic change (I think its good to mod core classes to make them more testable). ' On the test, can we have a better name than TestHRegionserverKilled? It doesn't say what this test does (testDataCorrectnessWhenMasterFailOver might be better as class name). Why do this: +Configuration conf = HBaseConfiguration.create(); Why not use whats in your HBaseTestingUtility (do a getConfiguration -- see the attached test). Also, you might use the junit primitives for test setup and teardown as per the attache test. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Fix For: 0.94.0, 0.92.1 Attachments: 5270-90-testcase.patch, 5270-90.patch, 5270-testcase.patch, hbase-5270.patch This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size
[ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209478#comment-13209478 ] stack commented on HBASE-4365: -- I can add the above changes (will fix the superclass from where I copied this stuff too) but I'm more interested in feedback along the lines of whether folks think we should put this in as default split policy. If so, will then spend time on it trying it on cluster, otherwise not. Add a decent heuristic for region size -- Key: HBASE-4365 URL: https://issues.apache.org/jira/browse/HBASE-4365 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0 Reporter: Todd Lipcon Attachments: 4365.txt A few of us were brainstorming this morning about what the default region size should be. There were a few general points made: - in some ways it's better to be too-large than too-small, since you can always split a table further, but you can't merge regions currently - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large regions (10GB+) - for small tables you may want a small region size just so you can distribute load better across a cluster - for big tables, multi-GB is probably best -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209561#comment-13209561 ] stack commented on HBASE-5200: -- Ram You should add it to trunk too going by Ted's reasoning above. You reviewed my last version? Its ok w/ you? AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size
[ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209621#comment-13209621 ] stack commented on HBASE-4365: -- Chatting w/ J-D, probably less disruptive if we do square of the count of regions on a regionserver so we get to max size faster (then there'll be less regions created overall by this phenomeon). Add a decent heuristic for region size -- Key: HBASE-4365 URL: https://issues.apache.org/jira/browse/HBASE-4365 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0 Reporter: Todd Lipcon Attachments: 4365.txt A few of us were brainstorming this morning about what the default region size should be. There were a few general points made: - in some ways it's better to be too-large than too-small, since you can always split a table further, but you can't merge regions currently - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large regions (10GB+) - for small tables you may want a small region size just so you can distribute load better across a cluster - for big tables, multi-GB is probably best -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209650#comment-13209650 ] stack commented on HBASE-5075: -- @Ronhai.ma Put up a patch so we can see what you are thinking; are you talking about a supervisor-like process that will remove the regionserver ephemeral node if the pid goes missing? Thanks. regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: 代åæčæ Fix For: 0.90.5 regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop
[ https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209679#comment-13209679 ] stack commented on HBASE-4403: -- What kind of feedback do you need Jimmy? These are deprecated: org/apache/hadoop/hbase/HServerAddress.java org/apache/hadoop/hbase/HServerInfo.java I suppose you want to mark them anyways. Its ugly that the below is Audience=public and Stability-Evolving but I suppose it the truth: org/apache/hadoop/hbase/KeyValue.java I see this is still evolving: org/apache/hadoop/hbase/ClusterStatus.java (Its having backup masters added as we speak) Is this public? org/apache/hadoop/hbase/ClockOutOfSyncException.java I mean client can get it? Maybe this should be too... org/apache/hadoop/hbase/LocalHBaseCluster.java Its a useful tool I'd think. Some of these public=private are used in tests... does that make it audience=public? Or test building blocks are for devs and so audience=public Do we have defines for these classifications? If so, should go into dev section of manual... or into index or something. Is org/apache/hadoop/hbase/client/HConnectionManager.java public? Should be private? Would be good if connection implementation was 'hidden' org/apache/hadoop/hbase/io/TimeRange.java is in io? Doesn't that come through in client api? I'd think it stable too. Seems like its in wrong package. Master and Regionserver has stuff that is used in tests too... Some stuff in Util could be public... the Keying utility class, the Strings class... Otherwise +1. Good stuff. Adopt interface stability/audience classifications from Hadoop -- Key: HBASE-4403 URL: https://issues.apache.org/jira/browse/HBASE-4403 Project: HBase Issue Type: Task Affects Versions: 0.90.5, 0.92.0 Reporter: Todd Lipcon Assignee: Jimmy Xiang Attachments: hbase-4403-interface.txt, hbase-4403-nowhere-near-done.txt As HBase gets more widely used, we need to be more explicit about which APIs are stable and not expected to break between versions, which APIs are still evolving, etc. We also have many public classes that are really internal to the RS or Master and not meant to be used by users. Hadoop has adopted a classification scheme for audience (public, private, or limited-private) as well as stability (stable, evolving, unstable). I think we should copy these annotations to HBase and start to classify our public classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5414) Assign different memstoreTS to different KV's in the same WALEdit during replay
[ https://issues.apache.org/jira/browse/HBASE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209684#comment-13209684 ] stack commented on HBASE-5414: -- bq. can guarantee that the Put wins over the Delete even though both happened at the same NOW timestamp Trying to understand... I don't get it Kannan. If they both went in together at same TS, doesn't the Delete sort before the Put so we'd see it first... so it would overshadow the contemporaneous Put? Assign different memstoreTS to different KV's in the same WALEdit during replay --- Key: HBASE-5414 URL: https://issues.apache.org/jira/browse/HBASE-5414 Project: HBase Issue Type: Sub-task Components: client, coprocessors, regionserver Reporter: Amitanand Aiyer Fix For: 0.94.0 Attachments: HBASE-5414.D1749.1.patch HBASE-5203 combines all the different Puts/Deletes into one WALEdit. This is required to ensure that we persist the atomic mutation in its enterity and not in parts. When combined into a single WALEdit, we create one big familyMap that is a combination of all the family maps in the mutations. The KV's in this familyMap have no information about memstoreTS (it is not yet assigned). However, when we apply the mutations to the Memstore (if there are no failures) we end up incrementing the memstoreTS for each operation. This can lead to the client seeing different order of operations -- depending on weather or not there was a RS crash/restart. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209760#comment-13209760 ] stack commented on HBASE-5416: -- Interesting idea. Patch looks pretty non-invasive to add some nice functionality. Would appreciate some better doc in the filters package doc or over in the manual to accompany this change. Nice one Max. Comments on patch: Please follow the convention you see in the surrounding file and parenthesize code blocks. E.g. in the below: {code} + + @Override + public boolean isFamilyEssential(byte[] name) { +for (Filter filter : filters) + if (filter.isFamilyEssential(name)) +return true; +return false; + } {code} What is a 'joinedScanner' in the below: {code} + ListKeyValueScanner joinedScanners = new ArrayListKeyValueScanner(); {code} It needs a bit of a comment I'd say. Why drop the check for empty results in below? {code} - if (results.isEmpty() || filterRow()) { + boolean filtered = filterRow(); {code} Please submit a patch with a --no-prefix so we can see how your patch does against hadoopqa. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered-scans_0.90.4.patch, Filtered-scans_trunk.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop
[ https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209762#comment-13209762 ] stack commented on HBASE-4403: -- bq. How about those coprocessor and rest related classes? CPs we are calling dev facing APIs which would make them private I suppose in your classification scheme. REST classes similar since 'users' make REST calls -- the interface is the /table/row/, etc. scheme. Lets add Todd's definitions to the book as part of this patch or in a new one (I can do it if you want -- just say so). Adopt interface stability/audience classifications from Hadoop -- Key: HBASE-4403 URL: https://issues.apache.org/jira/browse/HBASE-4403 Project: HBase Issue Type: Task Affects Versions: 0.90.5, 0.92.0 Reporter: Todd Lipcon Assignee: Jimmy Xiang Attachments: hbase-4403-interface.txt, hbase-4403-nowhere-near-done.txt As HBase gets more widely used, we need to be more explicit about which APIs are stable and not expected to break between versions, which APIs are still evolving, etc. We also have many public classes that are really internal to the RS or Master and not meant to be used by users. Hadoop has adopted a classification scheme for audience (public, private, or limited-private) as well as stability (stable, evolving, unstable). I think we should copy these annotations to HBase and start to classify our public classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209771#comment-13209771 ] stack commented on HBASE-5416: -- @Max You need a test too. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered-scans_0.90.4.patch, Filtered-scans_trunk.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call
[ https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209908#comment-13209908 ] stack commented on HBASE-3584: -- Ugh, commented over in different issue: {code} stack has commented on the revision HBASE-5413 [jira] Rename RowMutation to RowMutations. We have the notion of Operation already. It would seem to cut across rows in that a Scan and a RowMutations is ugly, agree, but it does convey notions of 'many' and 'in a row'. Regards RowOperation, Get or Put subclass Operation (actually, its subclass http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/OperationWithAttributes.html). RowOperation therefore could work in that it narrows the set of Operations only, the odd thing is that Delete does not inherit from OperationWIthAttribute. I'd think we should fix this hole in our 'model' if we went with RowOperation. If I was looking into the client package and saw both RowMutation and RowOperation sitting beside each other, I'd think I'd be confused what to use whereas with RowMutation and RowMutuations... I'd think I'd have some notion how they might be used. AtomicRowMutation I feel is w/o meaning since all Row mutations are 'atomic' -- or its a bug - so the Atomic qualifier would seem to add nothing. As Thomas Pan says, just my 2c. {code} Allow atomic put/delete in one call --- Key: HBASE-3584 URL: https://issues.apache.org/jira/browse/HBASE-3584 Project: HBase Issue Type: New Feature Components: client, coprocessors, regionserver Reporter: ryan rawson Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt Right now we have the following calls: put(Put) delete(Delete) increment(Increments) But we cannot combine all of the above in a single call, complete with a single row lock. It would be nice to do that. It would also allow us to do a CAS where we could do a put/increment if the check succeeded. - Amendment: Since Increment does not currently support MVCC it cannot be included in an atomic operation. So this for Put and Delete only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5420) TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 hadoop)
[ https://issues.apache.org/jira/browse/HBASE-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209913#comment-13209913 ] stack commented on HBASE-5420: -- +1 on patch. Mind resubmitting w/ --no-prefix please Gregory and then doing 'submit patch' to try it against hadoopqa? TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 hadoop) - Key: HBASE-5420 URL: https://issues.apache.org/jira/browse/HBASE-5420 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5420.patch Test calls startMiniMapReduceCluster() but never calls shutdownMiniMapReduceCluster(). This causes failures with -Dhadoop.profile=23 when both testMROnTable and testMROnTableWithCustomMapper are run, because the cluster cannot start up properly for the second test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5421) use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build
[ https://issues.apache.org/jira/browse/HBASE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209925#comment-13209925 ] stack commented on HBASE-5421: -- I'm being lazy Shaneal. This patch effects the 0.23 profile section of the pom only? use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build Key: HBASE-5421 URL: https://issues.apache.org/jira/browse/HBASE-5421 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.92.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Labels: build Fix For: 0.92.1 Attachments: hbase-5421.patch Hadoop recently added hadoop-client and hadoop-minicluster artifacts for Hadoop 0.23+ that don't export all the internal dependencies (HADOOP-8009). Let's use them instead of manually specifying transitive dependency exclusion lists (which is error prone and annoying). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5421) use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build
[ https://issues.apache.org/jira/browse/HBASE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209927#comment-13209927 ] stack commented on HBASE-5421: -- Oh, you did it. use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build Key: HBASE-5421 URL: https://issues.apache.org/jira/browse/HBASE-5421 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.92.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Labels: build Fix For: 0.92.1 Attachments: hbase-5421.patch Hadoop recently added hadoop-client and hadoop-minicluster artifacts for Hadoop 0.23+ that don't export all the internal dependencies (HADOOP-8009). Let's use them instead of manually specifying transitive dependency exclusion lists (which is error prone and annoying). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5421) use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build
[ https://issues.apache.org/jira/browse/HBASE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209933#comment-13209933 ] stack commented on HBASE-5421: -- @Shaneal Yeah, submit patch will trigger hadoopqa. Lets see what it says. use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build Key: HBASE-5421 URL: https://issues.apache.org/jira/browse/HBASE-5421 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.92.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Labels: build Fix For: 0.92.1 Attachments: hbase-5421.patch Hadoop recently added hadoop-client and hadoop-minicluster artifacts for Hadoop 0.23+ that don't export all the internal dependencies (HADOOP-8009). Let's use them instead of manually specifying transitive dependency exclusion lists (which is error prone and annoying). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210093#comment-13210093 ] stack commented on HBASE-5200: -- Ok. Go ahead commit Ram? AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210367#comment-13210367 ] stack commented on HBASE-5075: -- Thanks for doing this. It looks very interesting. Please do not reformat existing code. It bloats your patch and makes reviews take longer; reviewer attention span is short (at least in this case) and its a shame to spend it going over code reformats. On the patch, is this necessary: + public String getRSPidAndRsZknode(); Can't you get the pid from a process listing? Or you want us to publish it via jmx? Or it looks like it is already published via jmx. Can your tool pick it up there? On the znode, can't you get the regionserver servername and then do lookup in zk directly? Can't you have supervisor do this? Is there not existing utilities that watch a pid and allow you do stuff when its gone? Or is it that you'd kill the server if a long GC pause? Do you have a bit of documentation on how this new utility works? Thanks. regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: 5075.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5425) Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler)
[ https://issues.apache.org/jira/browse/HBASE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210393#comment-13210393 ] stack commented on HBASE-5425: -- Committed to 0.92 branch too. Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler) Key: HBASE-5425 URL: https://issues.apache.org/jira/browse/HBASE-5425 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5, 0.92.0 Reporter: terry zhang Fix For: 0.94.0 Attachments: HBASE-5425.patch please take a look at the code below in EnableTableHandler(hbase master): {code:title=EnableTableHandler.java|borderStyle=solid} protected boolean waitUntilDone(long timeout) throws InterruptedException { . int lastNumberOfRegions = this.countOfRegionsInTable; while (!server.isStopped() remaining 0) { Thread.sleep(waitingTimeForEvents); regions = assignmentManager.getRegionsOfTable(tableName); if (isDone(regions)) break; // Punt on the timeout as long we make progress if (regions.size() lastNumberOfRegions) { lastNumberOfRegions = regions.size(); timeout += waitingTimeForEvents; } remaining = timeout - (System.currentTimeMillis() - startTime); } private boolean isDone(final ListHRegionInfo regions) { return regions != null regions.size() = this.countOfRegionsInTable; } {code} We can easily find out if we let lastNumberOfRegions = this.countOfRegionsInTable , the function of punt on timeout code will never be executed. I think initlize lastNumberOfRegions = 0 can make it work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210450#comment-13210450 ] stack commented on HBASE-5407: -- Liyin. Is this a backport for 0.89fb? If so, is there something you've added to your backport that we should have in trunk? Thanks boss. Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210460#comment-13210460 ] stack commented on HBASE-5200: -- Ram wants me to apply the patches here but the version for 0.90 is very different to the version for 0.92. This is removed: -RS_ZK_REGION_CLOSING (1), // RS is in process of closing a region And this is added: +M_ZK_REGION_CLOSING (51), // Master adds this region as closing in ZK This looks like a port from 0.92? AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5423) Regionserver may block forever on waitOnAllRegionsToClose when aborting
[ https://issues.apache.org/jira/browse/HBASE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210466#comment-13210466 ] stack commented on HBASE-5423: -- Patch looks good Chunhui. Change name of Set from addedRegionsToCallClose to closed. Why do we have this Set? Are we calling close multiple times on same region? So, we'd break even though online regions is not yet empty? {code} + if (this.regionsInTransitionInRS.isEmpty()) { +break; + } {code} Thanks. Regionserver may block forever on waitOnAllRegionsToClose when aborting --- Key: HBASE-5423 URL: https://issues.apache.org/jira/browse/HBASE-5423 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-5423.patch If closeRegion throws any exception (It would be caused by FS ) when RS is aborting, RS will block forever on waitOnAllRegionsToClose(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4640) Catch ClosedChannelException and document it
[ https://issues.apache.org/jira/browse/HBASE-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210483#comment-13210483 ] stack commented on HBASE-4640: -- +1 On commit add the CCE.getMessage to the LOG.WARN just in case its got info of use (I'm fine on skipping stack trace) Catch ClosedChannelException and document it Key: HBASE-4640 URL: https://issues.apache.org/jira/browse/HBASE-4640 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4640.patch ClosedChannelException is a pretty obscure exception for the non-expert and doesn't tell you why you get it. We should instead catch it, print a WARN, don't print a stack trace, and add a line in the book about this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler
[ https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210491#comment-13210491 ] stack commented on HBASE-5120: -- I committed to trunk. Will not commit to 0.92. Not important enough of a bug I'd say. Timeout monitor races with table disable handler Key: HBASE-5120 URL: https://issues.apache.org/jira/browse/HBASE-5120 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Zhihong Yu Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch, HBASE-5120_5.patch Here is what J-D described here: https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176 I think I will retract from my statement that it used to be extremely racy and caused more troubles than it fixed, on my first test I got a stuck region in transition instead of being able to recover. The timeout was set to 2 minutes to be sure I hit it. First the region gets closed {quote} 2012-01-04 00:16:25,811 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to sv4r5s38,62023,1325635980913 for region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. {quote} 2 minutes later it times out: {quote} 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636185810, server=null 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,027 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. (offlining) {quote} 100ms later the master finally gets the event: {quote} 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 1a4b111bcc228043e89f59c4c3f6a791 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Deleting existing unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Successfully deleted unassigned node for region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED {quote} At this point everything is fine, the region was processed as closed. But wait, remember that line where it said it was going to force an unassign? {quote} 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Creating unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state 2012-01-04 00:18:30,328 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server null returned java.lang.NullPointerException: Passed server is null for 1a4b111bcc228043e89f59c4c3f6a791 {quote} Now the master is confused, it recreated the RIT znode but the region doesn't even exist anymore. It even tries to shut it down but is blocked by NPEs. Now this is what's going on. The late ZK notification that the znode was deleted (but it got recreated after): {quote} 2012-01-04 00:19:33,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been deleted. {quote} Then it prints this, and much later tries to unassign it again: {quote} 2012-01-04 00:19:46,607 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on region to clear regions in transition; test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636310328, server=null ... 2012-01-04 00:20:39,623 DEBUG
[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler
[ https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210492#comment-13210492 ] stack commented on HBASE-5120: -- I did not commit to 0.90 either. Timeout monitor races with table disable handler Key: HBASE-5120 URL: https://issues.apache.org/jira/browse/HBASE-5120 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Zhihong Yu Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch, HBASE-5120_5.patch Here is what J-D described here: https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176 I think I will retract from my statement that it used to be extremely racy and caused more troubles than it fixed, on my first test I got a stuck region in transition instead of being able to recover. The timeout was set to 2 minutes to be sure I hit it. First the region gets closed {quote} 2012-01-04 00:16:25,811 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to sv4r5s38,62023,1325635980913 for region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. {quote} 2 minutes later it times out: {quote} 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636185810, server=null 2012-01-04 00:18:30,026 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,027 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. (offlining) {quote} 100ms later the master finally gets the event: {quote} 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 1a4b111bcc228043e89f59c4c3f6a791 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Deleting existing unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Successfully deleted unassigned node for region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED {quote} At this point everything is fine, the region was processed as closed. But wait, remember that line where it said it was going to force an unassign? {quote} 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db03587 Creating unassigned node for 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state 2012-01-04 00:18:30,328 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server null returned java.lang.NullPointerException: Passed server is null for 1a4b111bcc228043e89f59c4c3f6a791 {quote} Now the master is confused, it recreated the RIT znode but the region doesn't even exist anymore. It even tries to shut it down but is blocked by NPEs. Now this is what's going on. The late ZK notification that the znode was deleted (but it got recreated after): {quote} 2012-01-04 00:19:33,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been deleted. {quote} Then it prints this, and much later tries to unassign it again: {quote} 2012-01-04 00:19:46,607 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on region to clear regions in transition; test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. state=PENDING_CLOSE, ts=1325636310328, server=null ... 2012-01-04 00:20:39,623 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on region to clear regions in transition;
[jira] [Commented] (HBASE-5195) [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get
[ https://issues.apache.org/jira/browse/HBASE-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210565#comment-13210565 ] stack commented on HBASE-5195: -- This looks like a pretty important fix. Should it be more than major priority? Should it go into 0.92.1? [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get --- Key: HBASE-5195 URL: https://issues.apache.org/jira/browse/HBASE-5195 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5195.patch Without the ability to wrap the internal Scan on the Get, we can't override (or protect, in the case of access control) Gets as we can Scans. The result is inconsistent behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup
[ https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210567#comment-13210567 ] stack commented on HBASE-5209: -- Patch looks excellent. One issue is upping of the HMasterInterface version. Its the 'right' thing to do but then it means I can't apply to 0.92.1 and it breaks a 0.92 talking to a 0.94 which currently is possible. Can you try adding the isActiveMaster to the end of the Interface and NOT update the version. See if you can connect to a 0.92.1 server from a 0.92.0 client and see if it you can do basic HMasterInterface operations such as isLoadBalancer running. HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup Key: HBASE-5209 URL: https://issues.apache.org/jira/browse/HBASE-5209 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.94.0, 0.90.5, 0.92.0 Reporter: Aditya Acharya Assignee: David S. Wang Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff I have a multi-master HBase set up, and I'm trying to programmatically determine which of the masters is currently active. But the API does not allow me to do this. There is a getMaster() method in the HConnection class, but it returns an HMasterInterface, whose methods do not allow me to find out which master won the last race. The API should have a getActiveMasterHostname() or something to that effect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5265) Fix 'revoke' shell command
[ https://issues.apache.org/jira/browse/HBASE-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210570#comment-13210570 ] stack commented on HBASE-5265: -- Is this important for 0.92.1 lads? Fix 'revoke' shell command -- Key: HBASE-5265 URL: https://issues.apache.org/jira/browse/HBASE-5265 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Andrew Purtell Assignee: Eugene Koontz Fix For: 0.94.0, 0.92.1 The 'revoke' shell command needs to be reworked for the AccessControlProtocol implementation that was finalized for 0.92. The permissions being removed must exactly match what was previously granted. No wildcard matching is done server side. Allow two forms of the command in the shell for convenience: Revocation of a specific grant: {code} revoke user, table, column family [ , column_qualifier ] {code} Have the shell automatically do so for all permissions on a table for a given user: {code} revoke user, table {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5428) Allow for custom filters to be registered within the Thrift interface
[ https://issues.apache.org/jira/browse/HBASE-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210576#comment-13210576 ] stack commented on HBASE-5428: -- @Robert We need this code because the filter 'language' only recognizes some base set of filters? This code extends the set of filters known to the filter language? Allow for custom filters to be registered within the Thrift interface - Key: HBASE-5428 URL: https://issues.apache.org/jira/browse/HBASE-5428 Project: HBase Issue Type: Improvement Components: thrift Affects Versions: 0.92.0 Reporter: Robert Roland Labels: patch Fix For: 0.94.0 Attachments: ThriftCustomFilters.patch Custom filters work within the Java client API, but are not accessible within the Thrift API. Attempting to use one will generate a Filter Name x not supported Attached patch allows a user to specify a list of custom filters that are registered at Thrift server startup time within the HBase configuration files: property namehbase.thrift.filters/name valueMyFilter:com.foo.Filter,OtherFilter:com.foo.OtherFilter/value /property Patch created off SVN r1245727 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5279) NPE in Master after upgrading to 0.92.0
[ https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210582#comment-13210582 ] stack commented on HBASE-5279: -- Committed to 0.92 and to TRUNK. Thanks for the patch Tobias. NPE in Master after upgrading to 0.92.0 --- Key: HBASE-5279 URL: https://issues.apache.org/jira/browse/HBASE-5279 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Tobias Herbert Priority: Critical Fix For: 0.92.1 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch I have upgraded my environment from 0.90.4 to 0.92.0 after the table migration I get the following error in the master (permanent) {noformat} 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Aborting {noformat} I think that's because I had a hard crash in the cluster a while ago - and the following WARN since then {noformat} 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8} {noformat} my patch was simple to go around the NPE (as the other code around the lines) but I don't know if that's correct -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5195) [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get
[ https://issues.apache.org/jira/browse/HBASE-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210590#comment-13210590 ] stack commented on HBASE-5195: -- oops. Its not already in trunk. But my commit to get it into trunk was messy. I did it in three commits; first a bungled commit that added half which I then reverted. Then added back. Here is where I made my mess: {code} r1245773 | stack | 2012-02-17 13:34:46 -0800 (Fri, 17 Feb 2012) | 1 line HBASE-5195 [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get -- SECOND HALF OF THIS COMMIT r1245768 | stack | 2012-02-17 13:26:23 -0800 (Fri, 17 Feb 2012) | 1 line HBASE-5279 NPE in Master after upgrading to 0.92.0 -- REVERT OVERCOMMIT TO HREGION r1245767 | stack | 2012-02-17 13:24:21 -0800 (Fri, 17 Feb 2012) | 1 line HBASE-5279 NPE in Master after upgrading to 0.92.0 {code} r1245767 adds {code} @@ -3684,6 +3682,8 @@ } } +Scan scan = new Scan(get); + RegionScanner scanner = null; try { scanner = getScanner(scan); {code} ... but not... {code} @@ -3673,8 +3673,6 @@ */ private ListKeyValue get(Get get, boolean withCoprocessor) throws IOException { -Scan scan = new Scan(get); - ListKeyValue results = new ArrayListKeyValue(); // pre-get CP hook {code} ... then r1245768 reverts it because above failed to go in. Then r1245773 is fixup. [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get --- Key: HBASE-5195 URL: https://issues.apache.org/jira/browse/HBASE-5195 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5195.patch Without the ability to wrap the internal Scan on the Get, we can't override (or protect, in the case of access control) Gets as we can Scans. The result is inconsistent behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210616#comment-13210616 ] stack commented on HBASE-5399: -- I don't follow: {code} +// todo stack nkeywal +// We used to check in a loop that the master was running. +// 1) There were imbricated tries loop. One here and one in getMaster +// 2) As the master can disappear, it may not be necessary to check it. +// We don't need the connection immediately, but we used to check the +// connection at the beginning in the past, and it's more user friendly +// to have the error immediately. +// connection.isMasterRunning(); {code} I'd say lets not check master is there till we need it. Seems like a PITA going ahead and checking master on construction. This changes the behavior but I think its one thats ok to change. You are doing your own Callables. You've seen the Callables that go on in HTable. Any reason you avoid them? I suppose this is different in that you want to let go of the shared master. Looks fine. Are we getting retrieveClusterId on startup? Can we not do that? Can we do that when someone asks for it? Or is it happening after we've set up the zk connection anyways? Patch makes sense so far. Good stuff N. Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210629#comment-13210629 ] stack commented on HBASE-5422: -- I think I understand what is going on. Help me out Chunhui. So, on region open, we do update of the RIT timers. When bulk opening, we are not adding plans to this.regionPlans so that when an open comes in from a bulk assign, since we go against this.regionPlans, we'll not update timers of other outstanding RITs? This seems like nice bug fix. I see that BulkReOpen adds to this.regionPlans but it does it one at a time. Should it use your putAll? Maybe we should make an addPlan method that takes a Map of plans and have it used by BulkReOpen and by BulkOpen? StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, hbase-5422.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Regionserver's log 2012-02-13 18:07:43,537 INFO
[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop
[ https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210643#comment-13210643 ] stack commented on HBASE-4403: -- @Enis Agree Adopt interface stability/audience classifications from Hadoop -- Key: HBASE-4403 URL: https://issues.apache.org/jira/browse/HBASE-4403 Project: HBase Issue Type: Task Affects Versions: 0.90.5, 0.92.0 Reporter: Todd Lipcon Assignee: Jimmy Xiang Attachments: hbase-4403-interface.txt, hbase-4403-nowhere-near-done.txt As HBase gets more widely used, we need to be more explicit about which APIs are stable and not expected to break between versions, which APIs are still evolving, etc. We also have many public classes that are really internal to the RS or Master and not meant to be used by users. Hadoop has adopted a classification scheme for audience (public, private, or limited-private) as well as stability (stable, evolving, unstable). I think we should copy these annotations to HBase and start to classify our public classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5371) Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) API
[ https://issues.apache.org/jira/browse/HBASE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210663#comment-13210663 ] stack commented on HBASE-5371: -- bq. What about re-changing the version to 1, since we just added a new method, but not changed anything on the wire, it should be compatible. The only catch is that if you invoke the new API from a new client, but the server is using the old version, you would get a NoSuchMethod or smt. Is that asking for too much, wdyt? I think it a good idea. Test an old client talking to a new w/o changing the version. See what happens. If it works, lets get it into 0.92. Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) API Key: HBASE-5371 URL: https://issues.apache.org/jira/browse/HBASE-5371 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.94.0 Attachments: HBASE-5371_v2.patch, HBASE-5371_v3-noprefix.patch, HBASE-5371_v3.patch We need to introduce something like AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so that clients can check access rights before carrying out the operations. We need this kind of operation for HCATALOG-245, which introduces authorization providers for hbase over hcat. We cannot use getUserPermissions() since it requires ADMIN permissions on the global/table level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5371) Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) API
[ https://issues.apache.org/jira/browse/HBASE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210668#comment-13210668 ] stack commented on HBASE-5371: -- @Enis It has to work w/ 0.92? You can't wait on 0.94? Which should be soonish... Month or two? Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) API Key: HBASE-5371 URL: https://issues.apache.org/jira/browse/HBASE-5371 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.94.0 Attachments: HBASE-5371_v2.patch, HBASE-5371_v3-noprefix.patch, HBASE-5371_v3.patch We need to introduce something like AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so that clients can check access rights before carrying out the operations. We need this kind of operation for HCATALOG-245, which introduces authorization providers for hbase over hcat. We cannot use getUserPermissions() since it requires ADMIN permissions on the global/table level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4925) Collect test cases for hadoop/hbase cluster
[ https://issues.apache.org/jira/browse/HBASE-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210670#comment-13210670 ] stack commented on HBASE-4925: -- @Gaojinchao What did you finish? You have an automation that runs some of Thomas's tests? Collect test cases for hadoop/hbase cluster --- Key: HBASE-4925 URL: https://issues.apache.org/jira/browse/HBASE-4925 Project: HBase Issue Type: Brainstorming Components: test Reporter: Thomas Pan This entry is used to collect all the useful test cases to verify a hadoop/hbase cluster. This is to follow up on yesterday's hack day in Salesforce. Hopefully that the information would be very useful for the whole community. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5756) we can change defalult File Appender to RFA instead of DRFA.
[ https://issues.apache.org/jira/browse/HBASE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251300#comment-13251300 ] stack commented on HBASE-5756: -- You will have to convince Lars to apply hbase-5655 to trunk. Its too late to do it for 0.92 I'd say. we can change defalult File Appender to RFA instead of DRFA. Key: HBASE-5756 URL: https://issues.apache.org/jira/browse/HBASE-5756 Project: HBase Issue Type: Bug Reporter: rohithsharma Priority: Minor This can be a point of concern when on a certain day the logging happens more because of more and more activity. In that case the log file for that day can grow huge. These logs can not be opened for analysis since size is more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5756) we can change defalult File Appender to RFA instead of DRFA.
[ https://issues.apache.org/jira/browse/HBASE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251299#comment-13251299 ] stack commented on HBASE-5756: -- You will have to convince Lars to apply hbase-5655 to trunk. Its too late to do it for 0.92 I'd say. we can change defalult File Appender to RFA instead of DRFA. Key: HBASE-5756 URL: https://issues.apache.org/jira/browse/HBASE-5756 Project: HBase Issue Type: Bug Reporter: rohithsharma Priority: Minor This can be a point of concern when on a certain day the logging happens more because of more and more activity. In that case the log file for that day can grow huge. These logs can not be opened for analysis since size is more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5604) M/R tool to replay WAL files
[ https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251341#comment-13251341 ] stack commented on HBASE-5604: -- None +1 on commit. M/R tool to replay WAL files Key: HBASE-5604 URL: https://issues.apache.org/jira/browse/HBASE-5604 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 5604-v9.txt, HLog-5604-v3.txt Just an idea I had. Might be useful for restore of a backup using the HLogs. This could an M/R (with a mapper per HLog file). The tool would get a timerange and a (set of) table(s). We'd pick the right HLogs based on time before the M/R job is started and then have a mapper per HLog file. The mapper would then go through the HLog, filter all WALEdits that didn't fit into the time range or are not any of the tables and then uses HFileOutputFormat to generate HFiles. Would need to indicate the splits we want, probably from a live table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5756) we can change defalult File Appender to RFA instead of DRFA.
[ https://issues.apache.org/jira/browse/HBASE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251345#comment-13251345 ] stack commented on HBASE-5756: -- I would not hold up a release for this. Rohit, if you want to make a patch for the other branches, that would be welcome. Sounds like you might make an entry for the reference guide too A note on keeping a cap on growing logs? we can change defalult File Appender to RFA instead of DRFA. Key: HBASE-5756 URL: https://issues.apache.org/jira/browse/HBASE-5756 Project: HBase Issue Type: Bug Reporter: rohithsharma Priority: Minor This can be a point of concern when on a certain day the logging happens more because of more and more activity. In that case the log file for that day can grow huge. These logs can not be opened for analysis since size is more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251351#comment-13251351 ] stack commented on HBASE-5666: -- Yes. That looks good. Let me retry against hadoopqa. A test hung above. RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available
[ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251352#comment-13251352 ] stack commented on HBASE-5666: -- I see that you don't have the above change in v7. You want to add it in a v8 and retry hadoopqa? Thanks Matteo. RegionServer doesn't retry to check if base node is available - Key: HBASE-5666 URL: https://issues.apache.org/jira/browse/HBASE-5666 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true) {code} $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/local-regionservers.sh start 1 2 3 {code} but the region servers are not able to start... It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available. {code} 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,133296824: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4336) Convert source tree into maven modules
[ https://issues.apache.org/jira/browse/HBASE-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251355#comment-13251355 ] stack commented on HBASE-4336: -- bq. I'm for having it now since we know we are going to need common and security isn't yet in (and could still be required if we want to compile against older versions). One suggestion would be to have this issue predicated on security being rolled back into core (hbase-5372) -- then we could do w/o having to mess with a security module. Or not. Just do hbase-common and hbase-security? Can roll hbase-security back into common when time comes. I'm excited about this one. I liked the notion by Mat Corgan that we'd have an hbase-hregion module and then there'd be an hbase-wal so Li Pi can experiment standalone w/ multiple WALs at the one time, etc., etc. The new client would be one. Convert source tree into maven modules -- Key: HBASE-4336 URL: https://issues.apache.org/jira/browse/HBASE-4336 Project: HBase Issue Type: Task Components: build Reporter: Gary Helmling Priority: Critical Fix For: 0.96.0 When we originally converted the build to maven we had a single core module defined, but later reverted this to a module-less build for the sake of simplicity. It now looks like it's time to re-address this, as we have an actual need for modules to: * provide a trimmed down client library that applications can make use of * more cleanly support building against different versions of Hadoop, in place of some of the reflection machinations currently required * incorporate the secure RPC engine that depends on some secure Hadoop classes I propose we start simply by refactoring into two initial modules: * core - common classes and utilities, and client-side code and interfaces * server - master and region server implementations and supporting code This would also lay the groundwork for incorporating the HBase security features that have been developed. Once the module structure is in place, security-related features could then be incorporated into a third module -- security -- after normal review and approval. The security module could then depend on secure Hadoop, without modifying the dependencies of the rest of the HBase code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5756) we can change defalult File Appender to RFA instead of DRFA.
[ https://issues.apache.org/jira/browse/HBASE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251950#comment-13251950 ] stack commented on HBASE-5756: -- Default is DRFA in 0.94 and before. RFA after (0.96) we can change defalult File Appender to RFA instead of DRFA. Key: HBASE-5756 URL: https://issues.apache.org/jira/browse/HBASE-5756 Project: HBase Issue Type: Bug Reporter: rohithsharma Priority: Minor This can be a point of concern when on a certain day the logging happens more because of more and more activity. In that case the log file for that day can grow huge. These logs can not be opened for analysis since size is more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252207#comment-13252207 ] stack commented on HBASE-5737: -- Ram, the AM#setBalancer is not right? Doesn't AM make a balancer instance of its own up in its constructor? We should at least remove that. Could we pass in the load balancer to use into the AM's constructor rather than call a setBalancer method? Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5747) Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test
[ https://issues.apache.org/jira/browse/HBASE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252228#comment-13252228 ] stack commented on HBASE-5747: -- Not sure why tests are not completing. Running on a mac I see problem in this test: {code} Running org.apache.hadoop.hbase.zookeeper.TestZKLeaderManager Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.415 sec FAILURE! Results : Failed tests: testLeaderSelection(org.apache.hadoop.hbase.zookeeper.TestZKLeaderManager): New leader should exist {code} Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test Key: HBASE-5747 URL: https://issues.apache.org/jira/browse/HBASE-5747 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Blocker Attachments: 5474.txt, 5474v2.txt, 5474v3 (1).txt, 5474v3.txt Forward port as much as we can of Mikhail's hard-won test cleanups over on 0.89 branch Will improve our being able to run unit tests in //. He also found a few bugs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)
[ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252504#comment-13252504 ] stack commented on HBASE-5754: -- Let me do the same. I did not match generator map tasks to verify reducers. Then let me recreate the split issue Eric describes above. Thanks lads. data lost with gora continuous ingest test (goraci) --- Key: HBASE-5754 URL: https://issues.apache.org/jira/browse/HBASE-5754 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Environment: 10 node test cluster Reporter: Eric Newton Assignee: stack Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends. I put a billion entries into HBase, and ran the Verify map/reduce job. The verification failed because about 21K entries were missing. The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data. I re-ran the test with 100 million entries, and it verified successfully. Both of the times I tested using a billion entries, the verification failed. If I run the verification step twice, the results are consistent, so the problem is probably not on the verify step. Here's the versions of the various packages: ||package||version|| |hadoop|0.20.205.0| |hbase|0.92.1| |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277| |goraci|https://github.com/ericnewton/goraci tagged 2012-04-08| The change I made to goraci was to configure it for hbase and to allow it to build properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5773) HtablePool constructor not reading config files in certain cases
[ https://issues.apache.org/jira/browse/HBASE-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252586#comment-13252586 ] stack commented on HBASE-5773: -- It doesn't apply to 0.90 branch. HtablePool constructor not reading config files in certain cases Key: HBASE-5773 URL: https://issues.apache.org/jira/browse/HBASE-5773 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1, 0.94.1 Reporter: Ioan Eugen Stan Priority: Minor Fix For: 0.92.2, 0.94.0 Attachments: different-config-behaviour.patch Creating a HtablePool can issue two behaviour depanding on the constructor called. Case 1: loads the configs from hbase-site public HTablePool() { this(HBaseConfiguration.create(), Integer.MAX_VALUE); } Calling this with null values for Configuration: public HTablePool(final Configuration config, final int maxSize) { this(config, maxSize, null, null); } will issue: public HTablePool(final Configuration config, final int maxSize, final HTableInterfaceFactory tableFactory, PoolType poolType) { // Make a new configuration instance so I can safely cleanup when // done with the pool. this.config = config == null ? new Configuration() : config; which does not read the hbase-site config files as HBaseConfiguration.create() does. I've tracked this problem to all versions of hbase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3443) ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix
[ https://issues.apache.org/jira/browse/HBASE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252709#comment-13252709 ] stack commented on HBASE-3443: -- 0.94? ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix -- Key: HBASE-3443 URL: https://issues.apache.org/jira/browse/HBASE-3443 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3, 0.90.4, 0.90.5, 0.90.6, 0.92.0, 0.92.1 Reporter: Kannan Muthukkaruppan Assignee: Lars Hofhansl Priority: Critical Labels: corruption Fix For: 0.96.0 Attachments: 3443.txt For incrementColumnValue() HBASE-3082 adds an optimization to check memstores first, and only if not present in the memstore then check the store files. In the presence of deletes, the above optimization is not reliable. If the column is marked as deleted in the memstore, one should not look further into the store files. But currently, the code does so. Sample test code outline: {code} admin.createTable(desc) table = HTable.new(conf, tableName) table.incrementColumnValue(Bytes.toBytes(row), cf1name, Bytes.toBytes(column), 5); admin.flush(tableName) sleep(2) del = Delete.new(Bytes.toBytes(row)) table.delete(del) table.incrementColumnValue(Bytes.toBytes(row), cf1name, Bytes.toBytes(column), 5); get = Get.new(Bytes.toBytes(row)) keyValues = table.get(get).raw() keyValues.each do |keyValue| puts Expect 5; Got Value=#{Bytes.toLong(keyValue.getValue())}; end {code} The above prints: {code} Expect 5; Got Value=10 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3443) ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix
[ https://issues.apache.org/jira/browse/HBASE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252744#comment-13252744 ] stack commented on HBASE-3443: -- Oh, and if you don't fix it, you'll have to explain why you didn't to BenoƮt. ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix -- Key: HBASE-3443 URL: https://issues.apache.org/jira/browse/HBASE-3443 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3, 0.90.4, 0.90.5, 0.90.6, 0.92.0, 0.92.1 Reporter: Kannan Muthukkaruppan Assignee: Lars Hofhansl Priority: Critical Labels: corruption Fix For: 0.96.0 Attachments: 3443.txt For incrementColumnValue() HBASE-3082 adds an optimization to check memstores first, and only if not present in the memstore then check the store files. In the presence of deletes, the above optimization is not reliable. If the column is marked as deleted in the memstore, one should not look further into the store files. But currently, the code does so. Sample test code outline: {code} admin.createTable(desc) table = HTable.new(conf, tableName) table.incrementColumnValue(Bytes.toBytes(row), cf1name, Bytes.toBytes(column), 5); admin.flush(tableName) sleep(2) del = Delete.new(Bytes.toBytes(row)) table.delete(del) table.incrementColumnValue(Bytes.toBytes(row), cf1name, Bytes.toBytes(column), 5); get = Get.new(Bytes.toBytes(row)) keyValues = table.get(get).raw() keyValues.each do |keyValue| puts Expect 5; Got Value=#{Bytes.toLong(keyValue.getValue())}; end {code} The above prints: {code} Expect 5; Got Value=10 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5777) MiniHBaseCluster cannot start multiple region servers
[ https://issues.apache.org/jira/browse/HBASE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252881#comment-13252881 ] stack commented on HBASE-5777: -- We have an hbase-site.xml at src/test that is used when we run tests. It disables the UI. You think we should apply this patch too Jimmy? MiniHBaseCluster cannot start multiple region servers - Key: HBASE-5777 URL: https://issues.apache.org/jira/browse/HBASE-5777 Project: HBase Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: hbase-5777.patch MiniHBaseCluster can try to start multiple region servers. But all of them except one will die in putting up the web UI because of BindException since HConstants.REGIONSERVER_INFO_PORT_AUTO is set to false by default. This issue will make many unit tests depending on multiple region servers flaky, such as TestAdmin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5778) Turn on WAL compression by default
[ https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252947#comment-13252947 ] stack commented on HBASE-5778: -- +1 Add release note w/ how to turn it off Turn on WAL compression by default -- Key: HBASE-5778 URL: https://issues.apache.org/jira/browse/HBASE-5778 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5778.patch I ran some tests to verify if WAL compression should be turned on by default. For a use case where it's not very useful (values two order of magnitude bigger than the keys), the insert time wasn't different and the CPU usage 15% higher (150% CPU usage VS 130% when not compressing the WAL). When values are smaller than the keys, I saw a 38% improvement for the insert run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure WAL compression accounts for all the additional CPU usage, it might just be that we're able to insert faster and we spend more time in the MemStore per second (because our MemStores are bad when they contain tens of thousands of values). Those are two extremes, but it shows that for the price of some CPU we can save a lot. My machines have 2 quads with HT, so I still had a lot of idle CPUs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)
[ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253135#comment-13253135 ] stack commented on HBASE-5754: -- I ran w/ 10 generators and 10 slots for the verify step and got the below which doesn't prints out only a REFERENCED count. Running these recent tests I let it do its natural splitting so it grew from zero to 260odd regions so maybe the issue you see Eric comes of manual splits coming out of the UI. Let me try that next. Thanks lads. {code} 12/04/13 05:16:23 INFO mapred.JobClient: map 100% reduce 99% 12/04/13 05:16:54 INFO mapred.JobClient: map 100% reduce 100% 12/04/13 05:16:59 INFO mapred.JobClient: Job complete: job_201204092039_0046 12/04/13 05:16:59 INFO mapred.JobClient: Counters: 30 12/04/13 05:16:59 INFO mapred.JobClient: Job Counters 12/04/13 05:16:59 INFO mapred.JobClient: Launched reduce tasks=10 12/04/13 05:16:59 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=30125694 12/04/13 05:16:59 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/04/13 05:16:59 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/04/13 05:16:59 INFO mapred.JobClient: Rack-local map tasks=6 12/04/13 05:16:59 INFO mapred.JobClient: Launched map tasks=256 12/04/13 05:16:59 INFO mapred.JobClient: Data-local map tasks=250 12/04/13 05:16:59 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5832198 12/04/13 05:16:59 INFO mapred.JobClient: goraci.Verify$Counts 12/04/13 05:16:59 INFO mapred.JobClient: REFERENCED=10 12/04/13 05:16:59 INFO mapred.JobClient: File Output Format Counters 12/04/13 05:16:59 INFO mapred.JobClient: Bytes Written=0 12/04/13 05:16:59 INFO mapred.JobClient: FileSystemCounters 12/04/13 05:16:59 INFO mapred.JobClient: FILE_BYTES_READ=83022967343 12/04/13 05:16:59 INFO mapred.JobClient: HDFS_BYTES_READ=156414 12/04/13 05:16:59 INFO mapred.JobClient: FILE_BYTES_WRITTEN=112881560332 12/04/13 05:16:59 INFO mapred.JobClient: File Input Format Counters 12/04/13 05:16:59 INFO mapred.JobClient: Bytes Read=0 12/04/13 05:16:59 INFO mapred.JobClient: Map-Reduce Framework 12/04/13 05:16:59 INFO mapred.JobClient: Map output materialized bytes=29992170602 12/04/13 05:16:59 INFO mapred.JobClient: Map input records=10 12/04/13 05:16:59 INFO mapred.JobClient: Reduce shuffle bytes=29874879887 12/04/13 05:16:59 INFO mapred.JobClient: Spilled Records=7527086436 12/04/13 05:16:59 INFO mapred.JobClient: Map output bytes=25992155242 12/04/13 05:16:59 INFO mapred.JobClient: CPU time spent (ms)=20182570 12/04/13 05:16:59 INFO mapred.JobClient: Total committed heap usage (bytes)=99953082368 12/04/13 05:16:59 INFO mapred.JobClient: Combine input records=0 12/04/13 05:16:59 INFO mapred.JobClient: SPLIT_RAW_BYTES=156414 12/04/13 05:16:59 INFO mapred.JobClient: Reduce input records=20 12/04/13 05:16:59 INFO mapred.JobClient: Reduce input groups=10 12/04/13 05:16:59 INFO mapred.JobClient: Combine output records=0 12/04/13 05:16:59 INFO mapred.JobClient: Physical memory (bytes) snapshot=91762372608 12/04/13 05:16:59 INFO mapred.JobClient: Reduce output records=0 12/04/13 05:16:59 INFO mapred.JobClient: Virtual memory (bytes) snapshot=391126540288 12/04/13 05:16:59 INFO mapred.JobClient: Map output records=20 {code} data lost with gora continuous ingest test (goraci) --- Key: HBASE-5754 URL: https://issues.apache.org/jira/browse/HBASE-5754 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Environment: 10 node test cluster Reporter: Eric Newton Assignee: stack Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends. I put a billion entries into HBase, and ran the Verify map/reduce job. The verification failed because about 21K entries were missing. The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data. I re-ran the test with 100 million entries, and it verified successfully. Both of the times I tested using a billion entries, the verification failed. If I run the verification step twice, the results are consistent, so the problem is probably not on the verify step. Here's the versions of the various packages: ||package||version|| |hadoop|0.20.205.0| |hbase|0.92.1| |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277| |goraci|https://github.com/ericnewton/goraci tagged 2012-04-08| The change I made to goraci was to configure it for hbase and to allow it to build properly. -- This message is automatically generated by JIRA. If you think it was
[jira] [Commented] (HBASE-5778) Turn on WAL compression by default
[ https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253454#comment-13253454 ] stack commented on HBASE-5778: -- I backed it out of 0.94 and trunk. Turn on WAL compression by default -- Key: HBASE-5778 URL: https://issues.apache.org/jira/browse/HBASE-5778 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch I ran some tests to verify if WAL compression should be turned on by default. For a use case where it's not very useful (values two order of magnitude bigger than the keys), the insert time wasn't different and the CPU usage 15% higher (150% CPU usage VS 130% when not compressing the WAL). When values are smaller than the keys, I saw a 38% improvement for the insert run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure WAL compression accounts for all the additional CPU usage, it might just be that we're able to insert faster and we spend more time in the MemStore per second (because our MemStores are bad when they contain tens of thousands of values). Those are two extremes, but it shows that for the price of some CPU we can save a lot. My machines have 2 quads with HT, so I still had a lot of idle CPUs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5784) Enable mvn deploy of website
[ https://issues.apache.org/jira/browse/HBASE-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253691#comment-13253691 ] stack commented on HBASE-5784: -- Committed to trunk Enable mvn deploy of website Key: HBASE-5784 URL: https://issues.apache.org/jira/browse/HBASE-5784 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.96.0 Attachments: 5784.txt Up to this, deploy of website has been build local and then copy up to apache and put it into place under /www/hbase.apache.org. Change it so can have maven deploy the site. The good thing about having the latter do it is that its regular; permissions will always be the same so Doug and I won't be fighting each other when we stick stuff up there. Also, its a one step process rather than multiple. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira