Re: [DISCUSS] Working in master or feature branch for WAL refactoring?
Thanks, Sean. Like Duo says, I think we can't "hide" this stuff. For the most part, it shouldn't affect how the system works, but that is both a good thing and a bad thing (when we do introduce a bug). Thanks for weighing in too, Duo! I'm not worried about the logistics of doing the merge vote. I was more curious if people saw "big merge votes" as being systemic problems. Happy to hear from others, but the consensus seems to be that folks would still encourage a feature branch here which is just fine! Thanks again. On 9/25/18 10:22 PM, 张铎(Duo Zhang) wrote: For WAL, it is the core of HBase so please use a feature branch. And when merging, it is not a big problem if there are -1s, you just need to have 3 more +1s, according to the asf rule? And in my experience, merging things into master branch is not very hard. Thanks. Sean Busbey 于2018年9月26日周三 上午6:42写道: I became an advocate of feature branches instead of incremental changes on master after having to deal with the fall out of half-done features that added drag in the codebase but never made it to a state where downstream users could reliably interact with the feature. A specific example that leaps to mind is distributed log replay. That said the work of Zach, et al over on "Read Replica Clusters"[1] has made sympathetic again to the costs of a strict feature branch approach. Still, having a partially implemented feature in master that would block a release is a bad idea, I think. We're already ~1.25 years since branch-2 came off of master, probably a bit long if we're doing time-based release trains. Though I don't know what would justify a feature-driven release there. Are these things we could put behind a feature-flag? Probably not? [1]: https://issues.apache.org/jira/browse/HBASE-18477 On Tue, Sep 25, 2018 at 8:53 AM Josh Elser wrote: Hi, Under the umbrella of HBASE-20951, we had initial said that we'd start landing code changes on a feature branch when we get there. Sergey S asked the good question to me the other day: "why a feature branch and not master?" Honestly, I don't have a good answer. The goal in implementation is to move incrementally and not leave things "broken" when possible. Assuming that we can do that, using a feature branch would just create pain on the eventual merge back to master. Do folks have strong feelings here? Is using master better to prevent review "debt" from piling up when the merge-vote comes? Or, do people foresee an HBase 3 in the works before this work would be done (sometime in 2019)? Would like to hear your input. - Josh
Re: [DISCUSS] Working in master or feature branch for WAL refactoring?
For WAL, it is the core of HBase so please use a feature branch. And when merging, it is not a big problem if there are -1s, you just need to have 3 more +1s, according to the asf rule? And in my experience, merging things into master branch is not very hard. Thanks. Sean Busbey 于2018年9月26日周三 上午6:42写道: > I became an advocate of feature branches instead of incremental > changes on master after having to deal with the fall out of half-done > features that added drag in the codebase but never made it to a state > where downstream users could reliably interact with the feature. A > specific example that leaps to mind is distributed log replay. That > said the work of Zach, et al over on "Read Replica Clusters"[1] has > made sympathetic again to the costs of a strict feature branch > approach. > > Still, having a partially implemented feature in master that would > block a release is a bad idea, I think. We're already ~1.25 years > since branch-2 came off of master, probably a bit long if we're doing > time-based release trains. Though I don't know what would justify a > feature-driven release there. > > Are these things we could put behind a feature-flag? Probably not? > > [1]: https://issues.apache.org/jira/browse/HBASE-18477 > On Tue, Sep 25, 2018 at 8:53 AM Josh Elser wrote: > > > > Hi, > > > > Under the umbrella of HBASE-20951, we had initial said that we'd start > > landing code changes on a feature branch when we get there. Sergey S > > asked the good question to me the other day: "why a feature branch and > > not master?" > > > > Honestly, I don't have a good answer. The goal in implementation is to > > move incrementally and not leave things "broken" when possible. Assuming > > that we can do that, using a feature branch would just create pain on > > the eventual merge back to master. > > > > Do folks have strong feelings here? Is using master better to prevent > > review "debt" from piling up when the merge-vote comes? Or, do people > > foresee an HBase 3 in the works before this work would be done (sometime > > in 2019)? > > > > Would like to hear your input. > > > > - Josh > > >
[jira] [Created] (HBASE-21232) Show table state in Tables view on Master home page
stack created HBASE-21232: - Summary: Show table state in Tables view on Master home page Key: HBASE-21232 URL: https://issues.apache.org/jira/browse/HBASE-21232 Project: HBase Issue Type: Bug Components: UI Affects Versions: 2.1.0 Reporter: stack Assignee: stack Fix For: 2.1.1 Attachments: table.pdf Add a column to the Tables panel on the Master home page. Useful when trying to figure if table is enabled/disable/disabling/enabling... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] Working in master or feature branch for WAL refactoring?
I became an advocate of feature branches instead of incremental changes on master after having to deal with the fall out of half-done features that added drag in the codebase but never made it to a state where downstream users could reliably interact with the feature. A specific example that leaps to mind is distributed log replay. That said the work of Zach, et al over on "Read Replica Clusters"[1] has made sympathetic again to the costs of a strict feature branch approach. Still, having a partially implemented feature in master that would block a release is a bad idea, I think. We're already ~1.25 years since branch-2 came off of master, probably a bit long if we're doing time-based release trains. Though I don't know what would justify a feature-driven release there. Are these things we could put behind a feature-flag? Probably not? [1]: https://issues.apache.org/jira/browse/HBASE-18477 On Tue, Sep 25, 2018 at 8:53 AM Josh Elser wrote: > > Hi, > > Under the umbrella of HBASE-20951, we had initial said that we'd start > landing code changes on a feature branch when we get there. Sergey S > asked the good question to me the other day: "why a feature branch and > not master?" > > Honestly, I don't have a good answer. The goal in implementation is to > move incrementally and not leave things "broken" when possible. Assuming > that we can do that, using a feature branch would just create pain on > the eventual merge back to master. > > Do folks have strong feelings here? Is using master better to prevent > review "debt" from piling up when the merge-vote comes? Or, do people > foresee an HBase 3 in the works before this work would be done (sometime > in 2019)? > > Would like to hear your input. > > - Josh >
[jira] [Created] (HBASE-21231) Add documentation for MajorCompactor
Balazs Meszaros created HBASE-21231: --- Summary: Add documentation for MajorCompactor Key: HBASE-21231 URL: https://issues.apache.org/jira/browse/HBASE-21231 Project: HBase Issue Type: Task Components: documentation Affects Versions: 3.0.0 Reporter: Balazs Meszaros Assignee: Balazs Meszaros HBASE-19528 added a new MajorCompactor tool, but it lacks of documentation. Let's document it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21230) BackupUtils#checkTargetDir doesn't compose error message correctly
Ted Yu created HBASE-21230: -- Summary: BackupUtils#checkTargetDir doesn't compose error message correctly Key: HBASE-21230 URL: https://issues.apache.org/jira/browse/HBASE-21230 Project: HBase Issue Type: Bug Components: backuprestore Reporter: Ted Yu Here is related code: {code} String expMsg = e.getMessage(); String newMsg = null; if (expMsg.contains("No FileSystem for scheme")) { newMsg = "Unsupported filesystem scheme found in the backup target url. Error Message: " + newMsg; {code} I think the intention was to concatenate expMsg at the end of newMsg. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-14707) NPE spew getting metrics via jmx
[ https://issues.apache.org/jira/browse/HBASE-14707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob resolved HBASE-14707. --- Resolution: Cannot Reproduce Not seen in a while and not enough info to reproduce. The JVM used for the original report seems to have stack trace optimization turned on, where it discards the rest of the NPE after printing it the first few times. If this comes up again, then we'll try to address it. > NPE spew getting metrics via jmx > > > Key: HBASE-14707 > URL: https://issues.apache.org/jira/browse/HBASE-14707 > Project: HBase > Issue Type: Bug > Components: metrics >Reporter: stack >Priority: Major > > See this in branch-1 tip: > {code} > 2015-10-27 08:01:08,954 INFO [main-EventThread] > replication.ReplicationTrackerZKImpl: > /hbase/rs/e1101.halxg.cloudera.com,16020,1445958006576 znode expired, > triggering replicatorRemoved event > 2015-10-27 08:01:20,645 ERROR [685943200@qtp-893835279-134] util.JSONBean: > getting attribute Value of > "org.apache.hadoop.hbase.client":type="MetricsConnection",scope="hconnection-0x33abd9d3",name="executorPoolActiveThreads" > threw an exception > javax.management.RuntimeMBeanException: java.lang.NullPointerException > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678) > at > org.apache.hadoop.hbase.util.JSONBean.writeAttribute(JSONBean.java:235) > at org.apache.hadoop.hbase.util.JSONBean.write(JSONBean.java:209) > at org.apache.hadoop.hbase.util.JSONBean.access$000(JSONBean.java:53) > at org.apache.hadoop.hbase.util.JSONBean$1.write(JSONBean.java:96) > at > org.apache.hadoop.hbase.http.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:202) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > at > org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.hbase.http.ClickjackingPreventionFilter.doFilter(ClickjackingPreventionFilter.java:48) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1354) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) >
[jira] [Resolved] (HBASE-14950) Create table with AC fails when quota is enabled
[ https://issues.apache.org/jira/browse/HBASE-14950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob resolved HBASE-14950. --- Resolution: Cannot Reproduce > Create table with AC fails when quota is enabled > > > Key: HBASE-14950 > URL: https://issues.apache.org/jira/browse/HBASE-14950 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 1.1.2 >Reporter: Ashish Singhi >Priority: Critical > > Scenario: > 1. Set hbase.quota.enabled to true > 2. As per the [ACL matrix | > http://hbase.apache.org/book.html#appendix_acl_matrix] for create table, > grant '@group1', 'C', '@ns1' > 3. From a user of group1, create 't1', 'd' -- *Failed* > {noformat} > ERROR: java.io.IOException: Namespace Descriptor found null for ns1 This is > unexpected. > at > org.apache.hadoop.hbase.namespace.NamespaceStateManager.checkAndUpdateNamespaceTableCount(NamespaceStateManager.java:170) > at > org.apache.hadoop.hbase.namespace.NamespaceAuditor.checkQuotaToCreateTable(NamespaceAuditor.java:76) > at > org.apache.hadoop.hbase.quotas.MasterQuotaManager.checkNamespaceTableAndRegionQuota(MasterQuotaManager.java:312) > at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1445) > at > org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:428) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:49404) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) > at java.lang.Thread.run(Thread.java:745) > {noformat} > When quota is enabled, then as part of createTable we internally also call > getNamespaceDescriptor which needs 'A' privilege. > So when quota is enabled we need both C and A permission to create a table. > ACL Matrix needs to be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21229) Add a nightly check that client-server wire compatibility works
Sean Busbey created HBASE-21229: --- Summary: Add a nightly check that client-server wire compatibility works Key: HBASE-21229 URL: https://issues.apache.org/jira/browse/HBASE-21229 Project: HBase Issue Type: Improvement Components: test Affects Versions: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1, 2.0.3 Reporter: Sean Busbey >From HBASE-20993: {quote} bq. Good reminder that we lack a unit test for wire compatibility. I wonder how hard it would be to grab the 1.2 shaded client artifact and use it to talk with the server code at head of branch. We could add a nightly test that did this pretty easily. Essentially we could just add it as an additional step in [the test that starts up a 1-node cluster and runs an example program|https://github.com/apache/hbase/blob/master/dev-support/hbase_nightly_pseudo-distributed-test.sh]. {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[DISCUSS] Working in master or feature branch for WAL refactoring?
Hi, Under the umbrella of HBASE-20951, we had initial said that we'd start landing code changes on a feature branch when we get there. Sergey S asked the good question to me the other day: "why a feature branch and not master?" Honestly, I don't have a good answer. The goal in implementation is to move incrementally and not leave things "broken" when possible. Assuming that we can do that, using a feature branch would just create pain on the eventual merge back to master. Do folks have strong feelings here? Is using master better to prevent review "debt" from piling up when the merge-vote comes? Or, do people foresee an HBase 3 in the works before this work would be done (sometime in 2019)? Would like to hear your input. - Josh
[jira] [Created] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later
Allan Yang created HBASE-21228: -- Summary: Memory leak since AbstractFSWAL caches Thread object and never clean later Key: HBASE-21228 URL: https://issues.apache.org/jira/browse/HBASE-21228 Project: HBase Issue Type: Bug Affects Versions: 1.4.7, 2.0.2, 2.1.0 Reporter: Allan Yang Assignee: Allan Yang In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and SyncFutures. {code} /** * Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse SyncFutures. * * TODO: Reuse FSWALEntry's rather than create them anew each time as we do SyncFutures here. * * TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers rather than have them get * them from this Map? */ private final ConcurrentMap syncFuturesByHandler; {code} A colleague of mine find a memory leak case caused by this map. Every thread who writes WAL will be cached in this map, And no one will clean the threads in the map even after the thread is dead. In one of our customer's cluster, we noticed that even though there is no requests, the heap of the RS is almost full and CMS GC was triggered every second. We dumped the heap and then found out there were more than 30 thousands threads with Terminated state. which are all cached in this map above. Everything referenced in these threads were leaked. Most of the threads are: 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL 2. hconnection-0x1f838e31-shared--pool, which are used to write index short circuit(Phoenix), and WAL will be write and sync in these threads. 3. Index writer thread(Phoenix), which referenced by RegionEnvironment then by HRegion and finally been referenced by PostOpenDeployTasksThread. We should turn this map into a thread local one, let JVM GC the terminated thread for us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21227) Implement exponential retrying backoff for Assign/UnassignRegionHandler introduced in HBASE-21217
Duo Zhang created HBASE-21227: - Summary: Implement exponential retrying backoff for Assign/UnassignRegionHandler introduced in HBASE-21217 Key: HBASE-21227 URL: https://issues.apache.org/jira/browse/HBASE-21227 Project: HBase Issue Type: Sub-task Components: amv2, regionserver Reporter: Duo Zhang Fix For: 3.0.0, 2.2.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21226) Revisit the close region related code at RS side
Duo Zhang created HBASE-21226: - Summary: Revisit the close region related code at RS side Key: HBASE-21226 URL: https://issues.apache.org/jira/browse/HBASE-21226 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang We use the closeRegion method to close a region and it will schedule a CloseRegionHandler(before HBASE-21217). The problem here is that, the CloseRegionHandler and closeRegion method are mainly designed to be called by master, but in fact, when shutting down RS, we will also call the closeRegion method to close all the regions on the RS. In HBASE-21217, we change to use UnassignRegionHandler to close a region if the request is from master, so here we need to consider the close region request when shutting down RS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21225) Having RPC & Space quota on a table doesn't allow space quota to be removed using 'NONE'
Sakthi created HBASE-21225: -- Summary: Having RPC & Space quota on a table doesn't allow space quota to be removed using 'NONE' Key: HBASE-21225 URL: https://issues.apache.org/jira/browse/HBASE-21225 Project: HBase Issue Type: Bug Reporter: Sakthi Assignee: Sakthi A part of HBASE-20705 is still unresolved {noformat} hbase(main):005:0> create 't2','cf' Created table t2 Took 0.7619 seconds => Hbase::Table - t2 hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => '10M/sec' Took 0.0514 seconds hbase(main):007:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', POLICY => NO_WRITES Took 0.0162 seconds hbase(main):008:0> list_quotas OWNER QUOTAS TABLE => t2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE TABLE => t2 TYPE => SPACE, TABLE => t2, LIMIT => 1073741824, VIOLATION_POLICY => NO_WRIT ES 2 row(s) Took 0.0716 seconds hbase(main):009:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => NONE Took 0.0082 seconds hbase(main):010:0> list_quotas OWNER QUOTAS TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true 2 row(s) Took 0.0254 seconds hbase(main):011:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', POLICY => NO_WRITES Took 0.0082 seconds hbase(main):012:0> list_quotas OWNER QUOTAS TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true 2 row(s) Took 0.0411 seconds {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21224) Handle compaction queue duplication
Xu Cang created HBASE-21224: --- Summary: Handle compaction queue duplication Key: HBASE-21224 URL: https://issues.apache.org/jira/browse/HBASE-21224 Project: HBase Issue Type: Improvement Components: Compaction Reporter: Xu Cang Mentioned by [~allan163] that we may want to handle compaction queue duplication in this Jira https://issues.apache.org/jira/browse/HBASE-18451 Creating this item for further assessment and discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)