[jira] [Commented] (HBASE-4109) Hostname returned via reverse dns lookup contains trailing period if configured interface is not default
[ https://issues.apache.org/jira/browse/HBASE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221283#comment-13221283 ] Harsh J commented on HBASE-4109: Hi, This affects multihomed DataNodes as well. I've filed HADOOP-8134 upstream. Hostname returned via reverse dns lookup contains trailing period if configured interface is not default -- Key: HBASE-4109 URL: https://issues.apache.org/jira/browse/HBASE-4109 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.90.3 Reporter: Shrijeet Paliwal Assignee: Shrijeet Paliwal Fix For: 0.90.4 Attachments: 0001-HBASE-4109-Sanitize-hostname-returned-from-DNS-class.patch If you are using an interface anything other than 'default' (literally that keyword) DNS.java 's getDefaultHost will return a string which will have a trailing period at the end. It seems javadoc of reverseDns in DNS.java (see below) is conflicting with what that function is actually doing. It is returning a PTR record while claims it returns a hostname. The PTR record always has period at the end , RFC: http://irbs.net/bog-4.9.5/bog47.html We make call to DNS.getDefaultHost at more than one places and treat that as actual hostname. Quoting HRegionServer for example {code} String machineName = DNS.getDefaultHost(conf.get( hbase.regionserver.dns.interface, default), conf.get( hbase.regionserver.dns.nameserver, default)); {code} This causes inconsistencies. An example of such inconsistency was observed while debugging the issue Regions not getting reassigned if RS is brought down. More here http://search-hadoop.com/m/CANUA1qRCkQ1 We may want to sanitize the string returned from DNS class. Or better we can take a path of overhauling the way we do DNS name matching all over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4072) zoo.cfg inconsistently used and sometimes we use non-zk names for zk attributes
[ https://issues.apache.org/jira/browse/HBASE-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202368#comment-13202368 ] Harsh J commented on HBASE-4072: Given that HBase does have its own zk config parts, should we just drop out zoo.cfg read support? Its bought in more problems than a solution. zoo.cfg inconsistently used and sometimes we use non-zk names for zk attributes --- Key: HBASE-4072 URL: https://issues.apache.org/jira/browse/HBASE-4072 Project: HBase Issue Type: Bug Reporter: stack This issue was found by Lars: http://search-hadoop.com/m/n04sthNcji2/zoo.cfg+vs+hbase-site.xmlsubj=Re+zoo+cfg+vs+hbase+site+xml Lets fix the inconsistency found and fix the places where we use non-zk attribute name for a zk attribute in hbase (There's only a few places that I remember -- maximum client connections is one IIRC) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5281) Should a failure in creating an unassigned node abort the master?
[ https://issues.apache.org/jira/browse/HBASE-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196048#comment-13196048 ] Harsh J commented on HBASE-5281: Only TestSplitLogManager failure appears related to unassigned nodes. Investigating why it failed. Strace is: {quote} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.hbase.master.TestSplitLogManager.waitForCounter(TestSplitLogManager.java:148) at org.apache.hadoop.hbase.master.TestSplitLogManager.waitForCounter(TestSplitLogManager.java:128) at org.apache.hadoop.hbase.master.TestSplitLogManager.testUnassignedTimeout(TestSplitLogManager.java:414) {quote} Should a failure in creating an unassigned node abort the master? - Key: HBASE-5281 URL: https://issues.apache.org/jira/browse/HBASE-5281 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5 Reporter: Harsh J Assignee: Harsh J Fix For: 0.94.0, 0.92.1 Attachments: HBASE-5281.patch In {{AssignmentManager}}'s {{CreateUnassignedAsyncCallback}}, we have the following condition: {code} if (rc != 0) { // Thisis resultcode. If non-zero, need to resubmit. LOG.warn(rc != 0 for + path + -- retryable connectionloss -- + FIX see http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2;); this.zkw.abort(Connectionloss writing unassigned at + path + , rc= + rc, null); return; } {code} While a similar structure inside {{ExistsUnassignedAsyncCallback}} (which the above is linked to), does not have such a force abort. Do we really require the abort statement here, or can we make do without? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4224) Need a flush by regionserver rather than by table option
[ https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184281#comment-13184281 ] Harsh J commented on HBASE-4224: [Dropping by from the dev lists…, have not followed otherwise] I'd certainly like reading flushAllRegions() over flushRegions(null). Can we not also have it as a utility function in HRServer instead if HRI/f, if the interface changing is much to be worried about? Need a flush by regionserver rather than by table option Key: HBASE-4224 URL: https://issues.apache.org/jira/browse/HBASE-4224 Project: HBase Issue Type: Bug Components: shell Reporter: stack Assignee: Akash Ashok Attachments: HBase-4224-v2.patch, HBase-4224.patch This evening needed to clean out logs on the cluster. logs are by regionserver. to let go of logs, we need to have all edits emptied from memory. only flush is by table or region. We need to be able to flush the regionserver. Need to add this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5143) Fix config typo in pluggable load balancer factory
[ https://issues.apache.org/jira/browse/HBASE-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182219#comment-13182219 ] Harsh J commented on HBASE-5143: Test failures appear unrelated. Also, an {{ack/grep}} of the old config string yields no result across code base. Fix config typo in pluggable load balancer factory -- Key: HBASE-5143 URL: https://issues.apache.org/jira/browse/HBASE-5143 Project: HBase Issue Type: Sub-task Components: master Reporter: Harsh J Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5143.patch HBASE-4240 made LoadBalancer pluggable. Configuration it loads seems to be wrongly named and carries a typo: hbase.maser.loadBalancer.class Could rather be hbase.master.loadbalancer.class Luckily 0.92 is not out yet and we should fix it asap, before folks start using it. Attaching patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4240) Allow Loadbalancer to be pluggable.
[ https://issues.apache.org/jira/browse/HBASE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182205#comment-13182205 ] Harsh J commented on HBASE-4240: Hi, This introduced a badly named config. Please see HBASE-5143 for a fix. Allow Loadbalancer to be pluggable. --- Key: HBASE-4240 URL: https://issues.apache.org/jira/browse/HBASE-4240 Project: HBase Issue Type: New Feature Components: master Affects Versions: 0.94.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.92.0 Attachments: HBASE-4240-0.patch, HBASE-4240-1.patch, HBASE-4240-2.patch, HBASE-4240-3.patch Everyone seems to want something different from a load balancer. People want low latency, simplicity, and total control. It seems like at some point the load balancer can't be all things to all people. Something akin to what hadoop JT's pluggable scheduler seems like it will enable all solutions without making the code much more complex. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5143) Fix config typo in pluggable load balancer factory
[ https://issues.apache.org/jira/browse/HBASE-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182211#comment-13182211 ] Harsh J commented on HBASE-5143: Thanks Ram, will rebase my work branch on HBASE-3274 once this is in, and continue. I should have an update for incremental review soon (mostly i-to-r, next). Fix config typo in pluggable load balancer factory -- Key: HBASE-5143 URL: https://issues.apache.org/jira/browse/HBASE-5143 Project: HBase Issue Type: Sub-task Components: master Reporter: Harsh J Priority: Critical Attachments: HBASE-5143.patch HBASE-4240 made LoadBalancer pluggable. Configuration it loads seems to be wrongly named and carries a typo: hbase.maser.loadBalancer.class Could rather be hbase.master.loadbalancer.class Luckily 0.92 is not out yet and we should fix it asap, before folks start using it. Attaching patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5071) HFile has a possible cast issue.
[ https://issues.apache.org/jira/browse/HBASE-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178709#comment-13178709 ] Harsh J commented on HBASE-5071: Hm, given the array building, I can't really figure out a way to bypass this one. The following is ugly, but lemme know what you think of it: {code} int sizeToLoadOnOpen = min(fileSize - trailer.getLoadOnOpenDataOffset() - trailer.getTrailerSize(), Integer.MAX_VALUE); // First param computes for Long type, and we cap that to Integer.MAX_VALUE. {code} HFile has a possible cast issue. Key: HBASE-5071 URL: https://issues.apache.org/jira/browse/HBASE-5071 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.90.0 Reporter: Harsh J Labels: hfile HBASE-3040 introduced this line originally in HFile.Reader#loadFileInfo(...): {code} int allIndexSize = (int)(this.fileSize - this.trailer.dataIndexOffset - FixedFileTrailer.trailerSize()); {code} Which on trunk today, for HFile v1 is: {code} int sizeToLoadOnOpen = (int) (fileSize - trailer.getLoadOnOpenDataOffset() - trailer.getTrailerSize()); {code} This computed (and casted) integer is then used to build an array of the same size. But if fileSize is very large ( Integer.MAX_VALUE), then there's an easy chance this can go negative at some point and spew out exceptions such as: {code} java.lang.NegativeArraySizeException at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readAllIndex(HFile.java:805) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:832) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.loadFileInfo(StoreFile.java:1003) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:382) at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:438) at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:267) at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:209) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2088) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:358) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2661) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2647) {code} Did we accidentally limit single region sizes this way? (Unsure about HFile v2's structure so far, so do not know if v2 has the same issue.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-587) Add auto-primary-key feature
[ https://issues.apache.org/jira/browse/HBASE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178223#comment-13178223 ] Harsh J commented on HBASE-587: --- I guess ImportTSV has provided this for the generic TSV files at least. I'm not sure if trying to build something more generic would be worth it. Most have complex key structures by the looks of it today and I think its best left in the way of folks building out their own keys. Should we resolve this as Won't Fix / Later? Add auto-primary-key feature Key: HBASE-587 URL: https://issues.apache.org/jira/browse/HBASE-587 Project: HBase Issue Type: New Feature Reporter: Bryan Duxbury Priority: Trivial Some folks seem to be interested in having their row keys automatically generated in a unique fashion. Maybe we could do something like allow the user to specify they want an automatic key, and then we'll generate a GUID that's unique for that table and return it as part of the commit. Not sure what the mechanics would look like exactly, but seems doable and it's going to be a more prevalent use case as people start to put data into HBase first without touching another system or pushing data without a natural unique primary key. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-848) API to inspect cell deletions
[ https://issues.apache.org/jira/browse/HBASE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178225#comment-13178225 ] Harsh J commented on HBASE-848: --- Do you mean something like a single delete(…) call returning the last good cell (which'd also contain the last good TS)? I think that's too expensive and we have not seen a need for this come up at all (or at least I have not). Why do you seek this? API to inspect cell deletions - Key: HBASE-848 URL: https://issues.apache.org/jira/browse/HBASE-848 Project: HBase Issue Type: New Feature Components: client Affects Versions: 0.2.0 Reporter: Michael Bieniosek If a cell gets deleted, I'd like to have some API that gives me the deletion timestamp, as well as any versions that predate the deletion. One possibility might be to add a boolean flag to HTable.get -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-995) Allow alter of catalog tables; currently fails because columnfamily is 'illegal'
[ https://issues.apache.org/jira/browse/HBASE-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178232#comment-13178232 ] Harsh J commented on HBASE-995: --- Stack, do you also recall what was the drive behind having such a functionality, given that we usually do not mess with the catalog tables? Allow alter of catalog tables; currently fails because columnfamily is 'illegal' Key: HBASE-995 URL: https://issues.apache.org/jira/browse/HBASE-995 Project: HBase Issue Type: Bug Reporter: stack Turns out this ain't easy. The .META. schema is hard-coded. Moving it out of 0.19.0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1254) Limit values to 2G
[ https://issues.apache.org/jira/browse/HBASE-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178240#comment-13178240 ] Harsh J commented on HBASE-1254: 2GB limits in what? Byte array sizes? I think JVMs would already do that check earlier? Limit values to 2G Key: HBASE-1254 URL: https://issues.apache.org/jira/browse/HBASE-1254 Project: HBase Issue Type: Bug Reporter: stack Chatting up on IRC, 2G seems like way outer reaches of what we could ever handle. Add checks to client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3924) Improve Shell's CLI help
[ https://issues.apache.org/jira/browse/HBASE-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177087#comment-13177087 ] Harsh J commented on HBASE-3924: Lars, Sorry for the late get back. Thanks for committing your changes in! :) Improve Shell's CLI help Key: HBASE-3924 URL: https://issues.apache.org/jira/browse/HBASE-3924 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.90.3 Reporter: Lars George Assignee: Harsh J Priority: Trivial Fix For: 0.94.0 Attachments: 3924.txt, HBASE-3924.patch In the hirb.rb source we have {noformat} # so they don't go through to irb. Output shell 'usage' if user types '--help' cmdline_help = HERE # HERE document output as shell usage HBase Shell command-line options: formatFormatter for outputting results: console | html. Default: console -d | --debug Set DEBUG log levels. HERE found = [] format = 'console' script2run = nil log_level = org.apache.log4j.Level::ERROR for arg in ARGV if arg =~ /^--format=(.+)/i format = $1 if format =~ /^html$/i raise NoMethodError.new(Not yet implemented) elsif format =~ /^console$/i # This is default else raise ArgumentError.new(Unsupported format + arg) end found.push(arg) elsif arg == '-h' || arg == '--help' puts cmdline_help exit elsif arg == '-d' || arg == '--debug' log_level = org.apache.log4j.Level::DEBUG $fullBackTrace = true puts Setting DEBUG log level... else # Presume it a script. Save it off for running later below # after we've set up some environment. script2run = arg found.push(arg) # Presume that any other args are meant for the script. break end end {noformat} We should enhance the help printed when using -h/--help to look like this? {noformat} cmdline_help = HERE # HERE document output as shell usage HBase Shell command-line options: --format={console|html}Formatter for outputting results. Default: console -d | --debug Set DEBUG log levels. -h | --help This help. script-filename [script-options] HERE {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3274) Replace all config properties references in code with string constants
[ https://issues.apache.org/jira/browse/HBASE-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176204#comment-13176204 ] Harsh J commented on HBASE-3274: I'm recording all the strange param names I encounter as I do this at https://gist.github.com/76416a2211ece8edb95a Meanwhile, am hoping no more patches get in with config names as strings… :) Replace all config properties references in code with string constants -- Key: HBASE-3274 URL: https://issues.apache.org/jira/browse/HBASE-3274 Project: HBase Issue Type: Improvement Reporter: Lars George Assignee: Harsh J Priority: Trivial Original Estimate: 168h Remaining Estimate: 168h See HBASE-2721 for details. We have fixed the default values in HBASE-3272 but we should also follow Hadoop to remove all hardcoded strings that refer to configuration properties and move them to HConstants. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3924) Improve Shell's CLI help
[ https://issues.apache.org/jira/browse/HBASE-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176299#comment-13176299 ] Harsh J commented on HBASE-3924: +1 on that. It does look much more readable. Nit: I'd ditch the 'HBase' though, seems unnecessary. Or maybe use 'hbase' to avoid confusion about what it means there. Improve Shell's CLI help Key: HBASE-3924 URL: https://issues.apache.org/jira/browse/HBASE-3924 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.90.3 Reporter: Lars George Assignee: Harsh J Priority: Trivial Fix For: 0.92.0, 0.94.0 Attachments: HBASE-3924.patch In the hirb.rb source we have {noformat} # so they don't go through to irb. Output shell 'usage' if user types '--help' cmdline_help = HERE # HERE document output as shell usage HBase Shell command-line options: formatFormatter for outputting results: console | html. Default: console -d | --debug Set DEBUG log levels. HERE found = [] format = 'console' script2run = nil log_level = org.apache.log4j.Level::ERROR for arg in ARGV if arg =~ /^--format=(.+)/i format = $1 if format =~ /^html$/i raise NoMethodError.new(Not yet implemented) elsif format =~ /^console$/i # This is default else raise ArgumentError.new(Unsupported format + arg) end found.push(arg) elsif arg == '-h' || arg == '--help' puts cmdline_help exit elsif arg == '-d' || arg == '--debug' log_level = org.apache.log4j.Level::DEBUG $fullBackTrace = true puts Setting DEBUG log level... else # Presume it a script. Save it off for running later below # after we've set up some environment. script2run = arg found.push(arg) # Presume that any other args are meant for the script. break end end {noformat} We should enhance the help printed when using -h/--help to look like this? {noformat} cmdline_help = HERE # HERE document output as shell usage HBase Shell command-line options: --format={console|html}Formatter for outputting results. Default: console -d | --debug Set DEBUG log levels. -h | --help This help. script-filename [script-options] HERE {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss
[ https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170113#comment-13170113 ] Harsh J commented on HBASE-3065: As a result of this, can the blocks of code of the following manner be considered as resolved? {code} if (rc != 0) { // Thisis resultcode. If non-zero, need to resubmit. LOG.warn(rc != 0 for + path + -- retryable connectionloss -- + FIX see http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2;); return; } {code} This is from AssignmentManager, lines 1306 and 1336 (two instances). Or are those callbacks still not retrying in nature? Retry all 'retryable' zk operations; e.g. connection loss - Key: HBASE-3065 URL: https://issues.apache.org/jira/browse/HBASE-3065 Project: HBase Issue Type: Bug Reporter: stack Assignee: Liyin Tang Priority: Blocker Fix For: 0.92.0 Attachments: 3065-v3.txt, 3065-v4.txt, HBASE-3065-addendum.patch, HBase-3065[r1088475]_1.patch, hbase3065_2.patch The 'new' master refactored our zk code tidying up all zk accesses and coralling them behind nice zk utility classes. One improvement was letting out all KeeperExceptions letting the client deal. Thats good generally because in old days, we'd suppress important state zk changes in state. But there is at least one case the new zk utility could handle for the application and thats the class of retryable KeeperExceptions. The one that comes to mind is conection loss. On connection loss we should retry the just-failed operation. Usually the retry will just work. At worse, on reconnect, we'll pick up the expired session event. Adding in this change shouldn't be too bad given the refactor of zk corralled all zk access into one or two classes only. One thing to consider though is how much we should retry. We could retry on a timer or we could retry for ever as long as the Stoppable interface is passed so if another thread has stopped or aborted the hosting service, we'll notice and give up trying. Doing the latter is probably better than some kinda timeout. HBASE-3062 adds a timed retry on the first zk operation. This issue is about generalizing what is over there across all zk access. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss
[ https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170114#comment-13170114 ] Harsh J commented on HBASE-3065: Actually, the more critical one is from the CreateUnassigned one: {code} if (rc != 0) { // Thisis resultcode. If non-zero, need to resubmit. LOG.warn(rc != 0 for + path + -- retryable connectionloss -- + FIX see http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2;); this.zkw.abort(Connectionloss writing unassigned at + path + , rc= + rc, null); return; } {code} (Line 1306, AssignmentManager). As you see there, it aborts the whole thing instead of retrying. Retry all 'retryable' zk operations; e.g. connection loss - Key: HBASE-3065 URL: https://issues.apache.org/jira/browse/HBASE-3065 Project: HBase Issue Type: Bug Reporter: stack Assignee: Liyin Tang Priority: Blocker Fix For: 0.92.0 Attachments: 3065-v3.txt, 3065-v4.txt, HBASE-3065-addendum.patch, HBase-3065[r1088475]_1.patch, hbase3065_2.patch The 'new' master refactored our zk code tidying up all zk accesses and coralling them behind nice zk utility classes. One improvement was letting out all KeeperExceptions letting the client deal. Thats good generally because in old days, we'd suppress important state zk changes in state. But there is at least one case the new zk utility could handle for the application and thats the class of retryable KeeperExceptions. The one that comes to mind is conection loss. On connection loss we should retry the just-failed operation. Usually the retry will just work. At worse, on reconnect, we'll pick up the expired session event. Adding in this change shouldn't be too bad given the refactor of zk corralled all zk access into one or two classes only. One thing to consider though is how much we should retry. We could retry on a timer or we could retry for ever as long as the Stoppable interface is passed so if another thread has stopped or aborted the hosting service, we'll notice and give up trying. Doing the latter is probably better than some kinda timeout. HBASE-3062 adds a timed retry on the first zk operation. This issue is about generalizing what is over there across all zk access. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss
[ https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170355#comment-13170355 ] Harsh J commented on HBASE-3065: Good point. /me bangs his head on the wall for not trying first :) I'll spend some time in the weekend to try out 0.92 and force this callback to fail. Retry all 'retryable' zk operations; e.g. connection loss - Key: HBASE-3065 URL: https://issues.apache.org/jira/browse/HBASE-3065 Project: HBase Issue Type: Bug Reporter: stack Assignee: Liyin Tang Priority: Blocker Fix For: 0.92.0 Attachments: 3065-v3.txt, 3065-v4.txt, HBASE-3065-addendum.patch, HBase-3065[r1088475]_1.patch, hbase3065_2.patch The 'new' master refactored our zk code tidying up all zk accesses and coralling them behind nice zk utility classes. One improvement was letting out all KeeperExceptions letting the client deal. Thats good generally because in old days, we'd suppress important state zk changes in state. But there is at least one case the new zk utility could handle for the application and thats the class of retryable KeeperExceptions. The one that comes to mind is conection loss. On connection loss we should retry the just-failed operation. Usually the retry will just work. At worse, on reconnect, we'll pick up the expired session event. Adding in this change shouldn't be too bad given the refactor of zk corralled all zk access into one or two classes only. One thing to consider though is how much we should retry. We could retry on a timer or we could retry for ever as long as the Stoppable interface is passed so if another thread has stopped or aborted the hosting service, we'll notice and give up trying. Doing the latter is probably better than some kinda timeout. HBASE-3062 adds a timed retry on the first zk operation. This issue is about generalizing what is over there across all zk access. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3465) Hbase should use a HADOOP_HOME environment variable if available.
[ https://issues.apache.org/jira/browse/HBASE-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168762#comment-13168762 ] Harsh J commented on HBASE-3465: HBASE-4854 fixed this I think? Hbase should use a HADOOP_HOME environment variable if available. - Key: HBASE-3465 URL: https://issues.apache.org/jira/browse/HBASE-3465 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Ted Dunning Assignee: Alejandro Abdelnur Fix For: 0.92.0 Attachments: a1-HBASE-3465.patch I have been burned a few times lately while developing code by having the make sure that the hadoop jar in hbase/lib is exactly correct. In my own deployment, there are actually 3 jars and a native library to keep in sync that hbase shouldn't have to know about explicitly. A similar problem arises when using stock hbase with CDH3 because of the security patches changing the wire protocol. All of these problems could be avoided by not assuming that the hadoop library is in the local directory. Moreover, I think it might be possible to assemble the distribution such that the compile time hadoop dependency is in a cognate directory to lib and is referenced using a default value for HADOOP_HOME. Does anybody have any violent antipathies to such a change? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4834) CopyTable: Cannot have ZK source to destination
[ https://issues.apache.org/jira/browse/HBASE-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153839#comment-13153839 ] Harsh J commented on HBASE-4834: Frontend calls setConf of TableOutputFormat when its initialized for output-spec checks upon job.submit(), and that reloads the actual ZK keys in job.xml itself (which, in Hadoop, is written _after_ checkOutputSpecs(…) and such). CopyTable: Cannot have ZK source to destination --- Key: HBASE-4834 URL: https://issues.apache.org/jira/browse/HBASE-4834 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.1 Reporter: Linden Hillenbrand Priority: Critical During a Copy Table, involving --peer.adr, we found the following block of code: if (address != null) { ZKUtil.applyClusterKeyToConf(this.conf, address); } When we set ZK conf in setConf method, that also gets called in frontend when MR initializes TOF, so there's no way now to have two ZK points for a single job, cause source gets reset before job is submitted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2502) HBase won't bind to designated interface when more than one network interface is available
[ https://issues.apache.org/jira/browse/HBASE-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13143959#comment-13143959 ] Harsh J commented on HBASE-2502: Isn't this solved by using the hbase.master.dns.interface and hbase.master.dns.nameserver props right now? HBase won't bind to designated interface when more than one network interface is available -- Key: HBASE-2502 URL: https://issues.apache.org/jira/browse/HBASE-2502 Project: HBase Issue Type: Bug Reporter: stack See this message by Michael Segel up on the list: http://www.mail-archive.com/hbase-user@hadoop.apache.org/msg10042.html This comes up from time to time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4680) FSUtils.isInSafeMode() checks should operate on HBase root dir, where we have permissions
[ https://issues.apache.org/jira/browse/HBASE-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139915#comment-13139915 ] Harsh J commented on HBASE-4680: I apologize for missing the whole activity on this (didn't see a comment on HBASE-4510, oops). I think a better fix would have been to just catch other exceptions (which is something I missed despite commenting). This change is fine too, but has lead to HBASE-4705 scenarios. Could we revert this and instead add a: {code} catch (Safemode) { return true; } catch (Others) { ignore; } {code} Kinda block, while still operating on the only dir guaranteed to exist (/ - root)? FSUtils.isInSafeMode() checks should operate on HBase root dir, where we have permissions - Key: HBASE-4680 URL: https://issues.apache.org/jira/browse/HBASE-4680 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Assignee: Gary Helmling Priority: Critical Fix For: 0.92.0 Attachments: HBASE-4680.patch The HDFS safe mode check workaround introduced by HBASE-4510 performs a {{FileSystem.setPermission()}} operation on the root directory (/) when attempting to trigger a {{SafeModeException}}. As a result, it requires superuser privileges when running with DFS permission checking enabled. Changing the operations to act on the HBase root directory should be safe, since the master process must have write access to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4705) HBase won't initialize if /hbase is not present
[ https://issues.apache.org/jira/browse/HBASE-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139918#comment-13139918 ] Harsh J commented on HBASE-4705: I apologize for missing the whole activity on HBASE-4680. I think a better fix would have been, instead of the HBASE-4680, to just catch other exceptions (which is something I missed despite commenting). This change is fine too, but has lead to HBASE-4705 scenarios. Could we revert HBASE-4680 and instead add a: {code} catch (Safemode) { return true; } catch (Others) { ignore; } {code} Kinda block, while still operating on the only dir guaranteed to exist (/ - root)? This is so cause safemode is always checked first on the server side before anything else is returned. The superuser check is not client side. HBase won't initialize if /hbase is not present --- Key: HBASE-4705 URL: https://issues.apache.org/jira/browse/HBASE-4705 Project: HBase Issue Type: Bug Reporter: Harsh J {code} 2011-10-31 00:09:09,549 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.FileNotFoundException: File does not exist: hdfs://C3S31:9000/hbase at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:731) at org.apache.hadoop.hbase.util.FSUtils.isInSafeMode(FSUtils.java:163) at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:458) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:301) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:112) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:426) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309) at java.lang.Thread.run(Thread.java:662) 2011-10-31 00:09:09,551 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2011-10-31 00:09:09,551 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {code} Trunk won't start HBase unless /hbase is already present, after HBASE-4680 (and the silly error I made in HBASE-4510). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4510) HDFS-1620 related changes downstream (For compiling with HDFS 0.23+)
[ https://issues.apache.org/jira/browse/HBASE-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117021#comment-13117021 ] Harsh J commented on HBASE-4510: Since this may require a lot of review+discussion, I've opened https://reviews.apache.org/r/2108/ HDFS-1620 related changes downstream (For compiling with HDFS 0.23+) Key: HBASE-4510 URL: https://issues.apache.org/jira/browse/HBASE-4510 Project: HBase Issue Type: Task Affects Versions: 0.94.0 Reporter: Harsh J Assignee: Harsh J Priority: Blocker HBase isn't seemingly compiling anymore on 0.23 after the HDFS-1620 naming refactorings were carried out. Two solutions: * We use new classnames. This breaks HBase's backward compatibility with older Hadoop releases (is that a concern with future releases?) * HBase gets its own sets of constants as the upstream one is not marked for public usage. This needs a little more maintenance on HBases' side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira