[jira] [Created] (HBASE-23065) [hbtop] Top-N heavy hitter user and client drill downs
Andrew Purtell created HBASE-23065: -- Summary: [hbtop] Top-N heavy hitter user and client drill downs Key: HBASE-23065 URL: https://issues.apache.org/jira/browse/HBASE-23065 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell After HBASE-15519, or after an additional change on top of it that provides necessary data in ClusterStatus, add drill down top-N views of activity aggregated per user or per client IP. Only a relatively small N of the heavy hitters need be tracked assuming this will be most useful when one or a handful of users or clients is generating problematic load and hbtop is invoked to learn their identity. This is a critical missing piece. After drilling down to find hot regions or tables, sometimes that is not enough, we also need to know which application out of many may be the source of the hot spotting load. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23061) Replace use of Jackson for JSON serde in hbase common and client modules
Andrew Purtell created HBASE-23061: -- Summary: Replace use of Jackson for JSON serde in hbase common and client modules Key: HBASE-23061 URL: https://issues.apache.org/jira/browse/HBASE-23061 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Fix For: 1.5.0 We are using Jackson to emit JSON in at least one place in common and client. We don't need all of Jackson and all the associated trouble just to do that. Use a suitably licensed JSON library with no known vulnerability. This will avoid problems downstream because we are trying to avoid having them pull in a vulnerable Jackson via us so Jackson is a provided scope. Here's where I am referring to: org.apache.hadoop.hbase.util.JsonMapper.(JsonMapper.java:37) at org.apache.hadoop.hbase.client.Operation.toJSON(Operation.java:70) at org.apache.hadoop.hbase.client.Operation.toString(Operation.java:96) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-22978) Online slow response log
Andrew Purtell created HBASE-22978: -- Summary: Online slow response log Key: HBASE-22978 URL: https://issues.apache.org/jira/browse/HBASE-22978 Project: HBase Issue Type: New Feature Components: Admin, regionserver, shell Reporter: Andrew Purtell Today when an individual RPC exceeds a configurable time bound we log a complaint by way of the logging subsystem. These log lines look like: {noformat} 2019-08-30 22:10:36,195 WARN [,queue=15,port=60020] ipc.RpcServer - (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1567203007549,"responsesize":6819737,"method":"Scan","param":"region { type: REGION_NAME value: \"tsdb,\\000\\000\\215\\f)o\\024\\302\\220\\000\\000\\000\\000\\000\\001\\000\\000\\000\\000\\000\\006\\000\\000\\000\\000\\000\\005\\000\\000","processingtimems":28646,"client":"10.253.196.215:41116","queuetimems":22453,"class":"HRegionServer"} {noformat} Unfortunately we often truncate the request parameters, like in the above example. We do this because the human readable representation is verbose, the rate of too slow warnings may be high, and the combination of these things can overwhelm the log capture system. (All of these things have been reported from various production settings. Truncation was added after we crashed a user's log capture system.) The truncation is unfortunate because it eliminates much of the utility of the warnings. For example, the region name, the start and end keys, and the filter hierarchy are all important clues for debugging performance problems caused by moderate to low selectivity queries or queries made at a high rate. We can maintain an in-memory ring buffer of requests that were judged to be too slow in addition to the responseTooSlow logging. The in-memory representation can be complete and compressed. A new admin API and shell command can provide access to the ring buffer for online performance debugging. A modest sizing of the ring buffer will prevent excessive memory utilization for a minor performance debugging feature by limiting the total number of retained records. There is some chance a high rate of requests will cause information on other interesting requests to be overwritten before it can be read. This is the nature of a ring buffer and an acceptable trade off. The write request types do not require us to retain all information submitted in the request. We don't need to retain all key-values in the mutation, which may be too large to comfortably retain. We only need a unique set of row keys, or even a min/max range, and total counts. The consumers of this information will be debugging tools. We can afford to apply fast compression to ring buffer entries (if codec support is available), something like snappy or zstandard, and decompress on the fly when servicing the retrieval API request. This will minimize the impact of retaining more information about slow requests than we do today. This proposal is for retention of request information only, the same information provided by responseTooSlow warnings. Total size of response serialization, possibly also total cell or row counts, should be sufficient to characterize the response. — New shell commands: {{get_slow_responses [ , \{ SERVERS=> } ]}} Retrieve, decode, and pretty print the contents of the too slow response ring buffer. Provide a table name as first argument to find all regions and retrieve too slow response entries for the given table from all servers currently hosting it. Provide a region name as first argument to retrieve all too slow response entries for the given region. Optionally provide a map of parameters as second argument. The SERVERS parameter, which expects a list of server names, will constrain the search to the given set of servers. A server name is its host, port, and start code, e.g. "host187.example.com,60020,1289493121758". {{get_slow_responses [ ... , ]}} Retrieve, decode, and pretty print the contents of the too slow response ring buffer maintained by the given list of servers; or all servers on the cluster if no argument is provided. A server name is its host, port, and start code, e.g. "host187.example.com,60020,1289493121758". {{clear_slow_responses [ ... , ]}} Clear the too slow response ring buffer maintained by the given list of servers; or all servers on the cluster if no argument is provided. A server name is its host, port, and start code, e.g. "host187.example.com,60020,1289493121758". — New Admin APIs: {code:java} List Admin#getSlowResponses(String tableOrRegion, @Nullable List servers); {code} {code:java} List Admin#getSlowResponses(@Nullable List servers); {code} {code:java} List Admin#clearSlowResponses(@Nullable List servers); {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (HBASE-22909) align hbase-vote script across all current releases
[ https://issues.apache.org/jira/browse/HBASE-22909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22909. Assignee: (was: Andrew Purtell) Resolution: Not A Problem > align hbase-vote script across all current releases > --- > > Key: HBASE-22909 > URL: https://issues.apache.org/jira/browse/HBASE-22909 > Project: HBase > Issue Type: Task > Components: build, community >Affects Versions: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0, 2.1.5, 1.3.5 >Reporter: Artem Ervits >Priority: Minor > > hbase-vote script is in different state across all of the current releases. > Now that https://issues.apache.org/jira/browse/HBASE-22464 is merged, this > Jira is to converge all releases on one version of hbase-vote. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HBASE-22900) Multiple httpclient versions included in binary package (branch-1.3)
Andrew Purtell created HBASE-22900: -- Summary: Multiple httpclient versions included in binary package (branch-1.3) Key: HBASE-22900 URL: https://issues.apache.org/jira/browse/HBASE-22900 Project: HBase Issue Type: Bug Affects Versions: 1.3.5 Reporter: Andrew Purtell Fix For: 1.3.6, 1.4.11 We are including multiple versions of httpcore and httpclient in the binary package. {noformat} httpclient-4.1.2.jar httpclient-4.2.5.jar httpclient-4.4.1.jar httpcore-4.1.2.jar httpcore-4.2.4.jar httpcore-4.4.1.jar {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (HBASE-22601) Misconfigured addition of peers leads to cluster shutdown.
[ https://issues.apache.org/jira/browse/HBASE-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22601. Fix Version/s: 1.4.11 1.3.6 1.5.0 Hadoop Flags: Reviewed Resolution: Fixed > Misconfigured addition of peers leads to cluster shutdown. > -- > > Key: HBASE-22601 > URL: https://issues.apache.org/jira/browse/HBASE-22601 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11 > > > Recently we added a peer to a production cluster which were in different > kerberos realm. > *Steps to reproduce:* > 1. Add a misconfigured peer which is in different kerberos realm. > 2. Remove that peer. > 3. All region servers will start to crash. > *RCA* > Enabled trace logging on one Region server for a short amount of time. > After adding peer, saw the following log lines. > {noformat} > 2019-06-18 22:19:20,949 INFO [main-EventThread] > replication.ReplicationTrackerZKImpl - /hbase/replication/peers znode > expired, triggering peerListChanged event > 2019-06-18 22:19:20,992 INFO [main-EventThread] > replication.ReplicationPeersZKImpl - Added new peer > cluster=:/hbase > 2019-06-18 22:19:21,113 INFO [main-EventThread] > zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x794a56d6 > connecting to ZooKeeper ensemble= > 2019-06-18 22:20:01,280 WARN [main-EventThread] zookeeper.ZKUtil - > hconnection-0x794a56d6-0x16b56265fbebb1b, quorum=, > baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid) > org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = > AuthFailed for /hbase/hbaseid > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:123) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:421) > at > org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) > at > org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:922) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:706) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.(ConnectionManager.java:638) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238) > at > org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:432) > at > org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:341) > at > org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144) > at > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.init(HBaseInterClusterReplicationEndpoint.java:135) > at > com.salesforce.hbase.replication.TenantReplicationEndpoint.init(TenantReplicationEndpoint.java:30) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:517) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:273) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.peerListChanged(ReplicationSourceManager.java:635) > at > org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl$PeersWatcher.nodeChildrenChanged(ReplicationTrackerZKImpl.java:192) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:643) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:544) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:519) > 2019-06-18
[jira] [Resolved] (HBASE-22823) Mark Canary as Public/Evolving
[ https://issues.apache.org/jira/browse/HBASE-22823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22823. Resolution: Won't Fix > Mark Canary as Public/Evolving > -- > > Key: HBASE-22823 > URL: https://issues.apache.org/jira/browse/HBASE-22823 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Caroline >Priority: Minor > Attachments: HBASE-22823.branch-1.000.patch, > HBASE-22823.branch-2.000.patch, HBASE-22823.master.000.patch > > > Canary is marked as a Private class. Its interfaces could change at any time. > Should we change the annotation on Canary to Public/Evolving? Or add > annotations on some of these subtypes? I think it depends on how we think > Canary results should be consumed. > In our production we find that scraping logs and parsing them is brittle and > not scalable. Although the scalability issue is more to do with the totality > of logs from a Hadoopish stack, if you run HBase then you have this problem, > and you wouldn't be using the canary if you didn't run HBase. We have a tool > that embeds the Canary and calls various methods and takes actions without > needing a round trip to the logs and whatever aggregates them. > I propose we promote Canary to Public/Evolving. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Reopened] (HBASE-22823) Mark Canary as Public/Evolving
[ https://issues.apache.org/jira/browse/HBASE-22823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-22823: HBASE-22874 is filed against this as a blocker. Reverting this commit in lieu of whatever happens on HBASE-22874 > Mark Canary as Public/Evolving > -- > > Key: HBASE-22823 > URL: https://issues.apache.org/jira/browse/HBASE-22823 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Caroline >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11 > > Attachments: HBASE-22823.branch-1.000.patch, > HBASE-22823.branch-2.000.patch, HBASE-22823.master.000.patch > > > Canary is marked as a Private class. Its interfaces could change at any time. > Should we change the annotation on Canary to Public/Evolving? Or add > annotations on some of these subtypes? I think it depends on how we think > Canary results should be consumed. > In our production we find that scraping logs and parsing them is brittle and > not scalable. Although the scalability issue is more to do with the totality > of logs from a Hadoopish stack, if you run HBase then you have this problem, > and you wouldn't be using the canary if you didn't run HBase. We have a tool > that embeds the Canary and calls various methods and takes actions without > needing a round trip to the logs and whatever aggregates them. > I propose we promote Canary to Public/Evolving. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Reopened] (HBASE-22810) Initialize an separate ThreadPoolExecutor for taking/restoring snapshot
[ https://issues.apache.org/jira/browse/HBASE-22810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-22810: Bisect fingers this as breaking TestFlushSnapshotFromClient on branch-1 {noformat} eb6b617d92998b887c8d811769c6a9d80b653432 is the first bad commit commit eb6b617d92998b887c8d811769c6a9d80b653432 Author: openinx Date: Thu Aug 15 10:57:42 2019 +0800 HBASE-22810 Initialize an separate ThreadPoolExecutor for taking/restoring snapshot (#486) .../apache/hadoop/hbase/executor/EventType.java| 4 +-- .../apache/hadoop/hbase/executor/ExecutorType.java | 1 + .../java/org/apache/hadoop/hbase/HConstants.java | 27 .../org/apache/hadoop/hbase/master/HMaster.java| 29 + .../hadoop/hbase/executor/TestExecutorService.java | 36 ++ 5 files changed, 83 insertions(+), 14 deletions(-) {noformat} > Initialize an separate ThreadPoolExecutor for taking/restoring snapshot > > > Key: HBASE-22810 > URL: https://issues.apache.org/jira/browse/HBASE-22810 > Project: HBase > Issue Type: Improvement >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6, 1.3.6, 1.4.11 > > > In EventType class, we have the following definition, means taking snapshot > & restoring snapshot are use the MASTER_TABLE_OPERATIONS Executor now. > {code} > /** >* Messages originating from Client to Master. >* C_M_SNAPSHOT_TABLE >* Client asking Master to snapshot an offline table. >*/ > C_M_SNAPSHOT_TABLE(48, ExecutorType.MASTER_TABLE_OPERATIONS), > /** >* Messages originating from Client to Master. >* C_M_RESTORE_SNAPSHOT >* Client asking Master to restore a snapshot. >*/ > C_M_RESTORE_SNAPSHOT (49, ExecutorType.MASTER_TABLE_OPERATIONS), > {code} > But when I checked the MASTER_TABLE_OPERATIONS thread pool initialization, I > see : > {code} > private void startServiceThreads() throws IOException{ >// ... some other code initializing >// We depend on there being only one instance of this executor running >// at a time. To do concurrency, would need fencing of enable/disable of >// tables. >// Any time changing this maxThreads to > 1, pls see the comment at >// AccessController#postCompletedCreateTableAction > > this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS, > 1); >startProcedureExecutor(); > {code} > That's to say, for CPs enable or disable table sequencely, we will create > a ThreadPoolExecutor with threadPoolSize=1. Then we actually cann't > accomplish the snapshoting concurrence even if they are total difference > tables, says if there are two table snapshoting request, and the Table A cost > 5min for snapshoting, then the Table B need to wait 5min and once Table A > finish its snapshot , then Table B will start the snapshot. > While we've setting the snapshot timeout, so it will be easy to timeout for > table B snapshoting . Actually, we can create a separate thead pool for > snapshot operations only. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22866) Multiple slf4j-log4j provider versions included in binary package (branch-1)
Andrew Purtell created HBASE-22866: -- Summary: Multiple slf4j-log4j provider versions included in binary package (branch-1) Key: HBASE-22866 URL: https://issues.apache.org/jira/browse/HBASE-22866 Project: HBase Issue Type: Bug Affects Versions: 1.5.0 Reporter: Andrew Purtell Fix For: 1.5.0 Examining binary assembly results there are multiple versions of slf4j-log4j in lib/ {noformat} slf4j-api-1.7.7.jar slf4j-log4j12-1.6.1.jar slf4j-log4j12-1.7.10.jar slf4j-log4j12-1.7.7.jar {noformat} We aren't managing slf4j-log4j12 dependency versions correctly, somehow. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22828) Log a region close journal
Andrew Purtell created HBASE-22828: -- Summary: Log a region close journal Key: HBASE-22828 URL: https://issues.apache.org/jira/browse/HBASE-22828 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell We already track region close activity with a MonitoredTask. Enable the status journal and dump it at DEBUG log level so if for some reasons region closes are taking a long time we have a timestamped journal of the activity and how long each step took. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (HBASE-22623) Add RegionObserver coprocessor hook for preWALAppend
[ https://issues.apache.org/jira/browse/HBASE-22623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22623. Resolution: Fixed Hadoop Flags: Reviewed Merged to master, picked to branch-2. Merged to branch-1. Please add a release note. Thanks [~gjacoby] > Add RegionObserver coprocessor hook for preWALAppend > > > Key: HBASE-22623 > URL: https://issues.apache.org/jira/browse/HBASE-22623 > Project: HBase > Issue Type: New Feature >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.3.0 > > > While many coprocessor hooks expose the WALEdit to implementing coprocs, > there aren't any that expose the WALKey before it's created and added to the > WALEntry. > It's sometimes useful for coprocessors to be able to edit the WALKey, for > example to add extended attributes using the fields to be added in > HBASE-22622. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22823) Mark Canary as Public/Evolving
Andrew Purtell created HBASE-22823: -- Summary: Mark Canary as Public/Evolving Key: HBASE-22823 URL: https://issues.apache.org/jira/browse/HBASE-22823 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22762. Resolution: Fixed Nothing to be done. Looks like under mt scenarios the logger only keeps a certain amount of buffer space free per thread and truncates overlong strings at their start. Might be fixable if we use something other than HRegion.LOG to log but no context to do that. Too much work for this change. Pushed addendum to branch-1.3, branch-1.4 and branch-1 > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 1.5.0, 1.3.6, 1.4.11 > > Attachments: HBASE-22762-branch-1-addendum.patch, > HBASE-22762.branch-1.001.patch, HBASE-22762.branch-1.002.patch, > HBASE-22762.branch-1.004.patch > > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Reopened] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-22762: Assignee: Andrew Purtell (was: Xu Cang) Reviewing recent commits I realized the scope of this drifted to just SplitTransactionImpl but we committed it using the description on the jira, which implies a change to all of the other "transaction journal" dumps too. I'm going to take this over, clone Xu's addition to SplitTransactionImpl to the others, and commit an addendum. > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 1.5.0, 1.3.6, 1.4.11 > > Attachments: HBASE-22762.branch-1.001.patch, > HBASE-22762.branch-1.002.patch, HBASE-22762.branch-1.004.patch > > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
[ https://issues.apache.org/jira/browse/HBASE-22762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22762. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.4.11 1.3.6 1.5.0 Pushed to the branch-1s > Print the delta between phases in the split/merge/compact/flush transaction > journals > > > Key: HBASE-22762 > URL: https://issues.apache.org/jira/browse/HBASE-22762 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Minor > Fix For: 1.5.0, 1.3.6, 1.4.11 > > Attachments: HBASE-22762.branch-1.001.patch, > HBASE-22762.branch-1.002.patch, HBASE-22762.branch-1.004.patch > > > We print the start timestamp for each phase when logging the > split/merge/compact/flush transaction journals and so when debugging an > operator must do the math by hand. It would be trivial to also print the > delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Reopened] (HBASE-22744) Remove deprecated classes around status and load
[ https://issues.apache.org/jira/browse/HBASE-22744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-22744: This change broke shell tests, see HBASE-22770. It also blocks HBASE-22735 > Remove deprecated classes around status and load > > > Key: HBASE-22744 > URL: https://issues.apache.org/jira/browse/HBASE-22744 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 3.0.0 >Reporter: Jan Hentschel >Assignee: Jan Hentschel >Priority: Major > Fix For: 3.0.0 > > > The client module has three deprecated classes around metrics, > {{ClusterStatus}}, {{RegionLoad}} and {{ServerLoad}}, which should be removed. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22770) TestShell fails on trunk with NameError: missing class name (`org.apache.hadoop.hbase.ClusterStatus')
Andrew Purtell created HBASE-22770: -- Summary: TestShell fails on trunk with NameError: missing class name (`org.apache.hadoop.hbase.ClusterStatus') Key: HBASE-22770 URL: https://issues.apache.org/jira/browse/HBASE-22770 Project: HBase Issue Type: Bug Components: shell, test Reporter: Andrew Purtell Running shell tests on trunk there are some failures related to use of ClusterStatus in admin.rb: {noformat} Error: ^[[48;5;16;38;5;226;1mtest_decommission_regionservers_with_non-existant_server_name(Hbase::CommissioningTest)^[[0m: NameError: missing class name (`org.apache.hadoop.hbase.ClusterStatus') org/jruby/javasupport/JavaPackage.java:259:in `method_missing' /Users/apurtell/src/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:1180:in `getRegionServers' /Users/apurtell/src/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:1206:in `getServerNames' src/test/ruby/hbase/admin2_test.rb:386:in `block in test_decommission_regionservers_with_non-existant_server_name' 383: end 384: 385: define_test 'decommission regionservers with non-existant server name' do ^[[48;5;16;38;5;226;1m => 386: server_name = admin.getServerNames([], true)[0].getServerName()^[[0m 387: assert_raise(ArgumentError) do 388: command(:decommission_regionservers, 'dummy') 389: end {noformat} {noformat} Error: ^[[48;5;16;38;5;226;1mtest_decommission_regionservers_with_server_host_name_and_port(Hbase::CommissioningTest)^[[0m: NameError: missing class name (`org.apache.hadoop.hbase.ClusterStatus') org/jruby/javasupport/JavaPackage.java:259:in `method_missing' /Users/apurtell/src/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:1180:in `getRegionServers' /Users/apurtell/src/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:1206:in `getServerNames' src/test/ruby/hbase/admin2_test.rb:371:in `block in test_decommission_regionservers_with_server_host_name_and_port' 368: end 369: 370: define_test 'decommission regionservers with server host name and port' do ^[[48;5;16;38;5;226;1m => 371: server_name = admin.getServerNames([], true)[0]^[[0m 372: host_name_and_port = server_name.getHostname + ',' +server_name.getPort.to_s 373: server_name_str = server_name.getServerName 374: command(:decommission_regionservers, host_name_and_port) {noformat} {noformat} Error: ^[[48;5;16;38;5;226;1mtest_decommission_regionservers_with_server_host_name_only(Hbase::CommissioningTest)^[[0m: NameError: missing class name (`org.apache.hadoop.hbase.ClusterStatus') org/jruby/javasupport/JavaPackage.java:259:in `method_missing' /Users/apurtell/src/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:1180:in `getRegionServers' /Users/apurtell/src/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:1206:in `getServerNames' src/test/ruby/hbase/admin2_test.rb:356:in `block in test_decommission_regionservers_with_server_host_name_only' 353: end 354: 355: define_test 'decommission regionservers with server host name only' do ^[[48;5;16;38;5;226;1m => 356: server_name = admin.getServerNames([], true)[0]^[[0m 357: host_name = server_name.getHostname 358: server_name_str = server_name.getServerName 359: command(:decommission_regionservers, host_name) {noformat} and so on -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22762) Print the delta between phases in the split/merge/compact/flush transaction journals
Andrew Purtell created HBASE-22762: -- Summary: Print the delta between phases in the split/merge/compact/flush transaction journals Key: HBASE-22762 URL: https://issues.apache.org/jira/browse/HBASE-22762 Project: HBase Issue Type: Improvement Components: logging Reporter: Andrew Purtell We print the start timestamp for each phase when logging the split/merge/compact/flush transaction journals and so when debugging an operator must do the math by hand. It would be trivial to also print the delta from the start timestamp of the previous phase and helpful to do so. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22735) list_regions may throw an error if a region is RIT
Andrew Purtell created HBASE-22735: -- Summary: list_regions may throw an error if a region is RIT Key: HBASE-22735 URL: https://issues.apache.org/jira/browse/HBASE-22735 Project: HBase Issue Type: Bug Components: shell Affects Versions: 1.5.0 Reporter: Andrew Purtell Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11 The 'list_regions' shell command gets a list of regions for a given table and then prints them and some attributes such as the server where they are located, current request count, data locality, and such. However if a region is in transition the command might fail with {{ERROR: undefined method `getDataLocality' for nil:NilClass}} and there may be other ways this can happen. Protect against use of nil references and just display what we can. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (HBASE-22715) All scan requests should be handled by scan handler threads in RWQueueRpcExecutor
[ https://issues.apache.org/jira/browse/HBASE-22715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22715. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.4.11 1.3.6 2.1.6 2.2.1 2.3.0 1.5.0 3.0.0 > All scan requests should be handled by scan handler threads in > RWQueueRpcExecutor > - > > Key: HBASE-22715 > URL: https://issues.apache.org/jira/browse/HBASE-22715 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.10 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11 > > > When we use RWQueueRpcExecutor, all scans should be handled by scan handler > threads, not read handler. > > Before HBASE-17508, when calling openScanner() in client, a region server > doesn't make results, just open scanner and return scanner id. So, this > request(open) is executed in read handlers intentionally. > > However, since HBASE-17508,, actual scan behavior happened while opening a > scanner, > I think this request should probably be executed in scan handlers when using > RWQueueRpcExecutor. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22728) Upgrade jackson dependencies in branch-1
Andrew Purtell created HBASE-22728: -- Summary: Upgrade jackson dependencies in branch-1 Key: HBASE-22728 URL: https://issues.apache.org/jira/browse/HBASE-22728 Project: HBase Issue Type: Sub-task Affects Versions: 1.3.5, 1.4.10 Reporter: Andrew Purtell Fix For: 1.5.0, 1.3.6, 1.4.11 Avoid Jackson versions and dependencies with known CVEs -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22686) ZkSplitLogWorkerCoordination doesn't allow a regionserver to pick up all of the split work it is capable of
Andrew Purtell created HBASE-22686: -- Summary: ZkSplitLogWorkerCoordination doesn't allow a regionserver to pick up all of the split work it is capable of Key: HBASE-22686 URL: https://issues.apache.org/jira/browse/HBASE-22686 Project: HBase Issue Type: Bug Reporter: Andrew Purtell A region hosted by a crashed regionserver cannot be reassigned until the crashed regionserver's write-ahead logs have been processed and split into per-region recovered edits files. Reassignment of a region from a crashed server will be held up by the distributed split work backlog. Every regionserver runs a background daemon thread that manages the acquisition and execution of distributed log split tasks. This thread registers a watcher on a znode managed by the master. When the master is processing a server shutdown or crash or cluster restart when it detects the presence of unprocessed WAL files it will register the WAL files for processing under the znode. One or more live regionservers will attempt to get an exclusive lock on an entry. One of them wins, splits the WAL file, deletes the entry, then will acquire more work or go back to sleep if the worklist is empty. A regionserver can acquire at most a fixed number of log split tasks determined by configuration, hbase.regionserver.wal.max.splitters (default 2). If the number of entries/logs to process exceeds the number of regionservers in the cluster, perhaps due to the correlated failure of a significant subset of the fleet, then splitting work will fall behind. Regions may remain in RIT until the backlog is cleared. However, the regionserver side coordination logic - ZkSplitLogWorkerCoordination - only allows a regionserver to grab one task one at a time. Nearby javadoc says "This policy puts an upper-limit on the number of simultaneous log splitting that could be happening in a cluster." That upper limit will be the number of currently live regionservers. I don't feel I understand exactly why this is necessary or appropriate because a regionserver can handle more than one task at once and in fact the max number of concurrent split tasks it can accept is configurable. {code} /** * This function calculates how many splitters it could create based on expected average tasks per * RS and the hard limit upper bound(maxConcurrentTasks) set by configuration. * At any given time, a RS allows spawn MIN(Expected Tasks/RS, Hard Upper Bound) * @param numTasks current total number of available tasks */ private int calculateAvailableSplitters(int numTasks) { // at lease one RS(itself) available int availableRSs = 1; try { List regionServers = ZKUtil.listChildrenNoWatch(watcher, watcher.rsZNode); availableRSs = Math.max(availableRSs, (regionServers == null) ? 0 : regionServers.size()); } catch (KeeperException e) { // do nothing LOG.debug("getAvailableRegionServers got ZooKeeper exception", e); } int expectedTasksPerRS = (numTasks / availableRSs) + ((numTasks % availableRSs == 0) ? 0 : 1); expectedTasksPerRS = Math.max(1, expectedTasksPerRS); // at least be one // calculate how many more splitters we could spawn return Math.min(expectedTasksPerRS, maxConcurrentTasks) - this.tasksInProgress.get(); {code} Shouldn't this simply be: {code} private int calculateAvailableSplitters() { return maxConcurrentTasks - tasksInProgress.get(); } {code} ? This is branch-1. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22660) Precise end to end tracking of cross cluster replication latency
Andrew Purtell created HBASE-22660: -- Summary: Precise end to end tracking of cross cluster replication latency Key: HBASE-22660 URL: https://issues.apache.org/jira/browse/HBASE-22660 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell ageOfLastShippedOp tracks replication latency forward from the point where a source process tailing a WAL has found an edit to ship. This is not an end to end measure. To achieve a holistic end to end measure we should have an active process that periodically injects sentinel values at commit time adjacent to the WALedits carrying application data at the source and records when they are finally processed at the sink, using a timestamp embedded in the sentinel to measure true end to end latency for the adjacent commit. This could be done for a configurable (and small) percentage of commits so would give a probabilistic measure with confidence controlled by sample rate. It should be done this way rather than by passively sampling cell timestamps because cell timestamps can be set by the user and may not correspond to wall clock time. We could introduce a new type of synthetic WALedit, a new global metric, and because the adjacent commit from which we build the sentinel contains table information we could track that too and add a per table metric. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22659) Resilient block caching for cache sensitive data serving
Andrew Purtell created HBASE-22659: -- Summary: Resilient block caching for cache sensitive data serving Key: HBASE-22659 URL: https://issues.apache.org/jira/browse/HBASE-22659 Project: HBase Issue Type: Brainstorming Components: BlockCache Reporter: Andrew Purtell Caching in data serving remains crucial for performance. Networks are fast but not yet fast enough. RDMA may change this once it becomes more popular and available. Caching layers should be resilient to crashes to avoid the cost of rewarming. In the context of HBase with root filesystem placed on S3, the object store is quite slow relative to other options like HDFS, so caching is particularly essential given the rewarming costs will be high, either client visible performance degradation (due to cache miss and reload) or elevated IO due to prefetching. We expect for cloud serving when backed by S3 the HBase blockcache will be configured for hosting the entirety of the warm set, which may be very large, so we also expect the selection of the file backed option and the placement of the filesystem for cache file storage on local fast solid state devices. These devices offer data persistence beyond the lifetime of an individual process. We can take advantage of this to make block caching partially resilient to short duration process failures and restarts. When the blockcache is backed by a file system, when starting up it can reinitialize and prewarm using a scan over preexisting disk contents. These will be cache files left behind by another process executing earlier on the same instance. This strategy is applicable to process restart and rolling upgrade scenarios specifically. (The local storage may not survive an instance reboot.) Once the server has reloaded the blockcache metadata from local storage it can advertise to the HMaster the list of HFiles for which it has some precached blocks resident. This implies the blockcache's file backed option should maintain a mapping of source HFile paths for the blocks in cache. We don't need to provide more granular information on which blocks (or not) of the HFile are in cache. It is unlikely entries for the HFile will be cached elsewhere. We can assume placement of a region containing the HFile on a server with any block cached there will be better than alternatives. The HMaster already waits for regionserver registration activity to stabilize before assigning regions and we can contemplate adding configurable delay in region reassignment for sever crash handling in the hopes a restarted or recovered instance will come online and report in-cache reloaded contents in time for an assignment decision to consider this new factor in data locality. When finally processing (re)assignment the HMaster can consider this additional factor when building the assignment plan. We already calculate a HDFS level locality metric. We can also calculate a new cache level locality metric aggregated from regionserver reports of re-warmed cache contents. For a given region we can build a candidate assignment set of servers reporting cached blocks for its associated HFiles, and the master can assign the region to the server with the highest weight. Otherwise we (re)assign using the HDFS locality metric as before. In this way during rolling restart or quick process restart via supervisory process scenarios we are very likely to assign a region back to the server that was most recently hosting it, and we can pick up for immediate reuse any file backed blockcache data accumulated for the region by the previous process. These are going to be the most common scenarios encountered during normal cluster operation. This will allow HBase's internal data caching to be resilient to short duration crashes and administrative process restarts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22648) Snapshot TTL
Andrew Purtell created HBASE-22648: -- Summary: Snapshot TTL Key: HBASE-22648 URL: https://issues.apache.org/jira/browse/HBASE-22648 Project: HBase Issue Type: New Feature Components: snapshots Reporter: Andrew Purtell Snapshots have a lifecycle that is independent from the table from which they are created. Although data in a table may be stored with TTL the data files containing them become frozen by the snapshot. Space consumed by expired cells will not be reclaimed by normal table housekeeping like compaction. While this is expected it can be inconvenient at scale. When many snapshots are under management and the data in various tables is expired by TTL some notion of optional TTL (and optional default TTL) for snapshots could be useful. It will help prevent the accumulation of junk files by automatically dropping the snapshot after the assigned TTL, making their data files eligible for cleaning. More comprehensive snapshot lifecycle management may be considered in the future but this one case is expected to be immediately useful given TTls on data are commonly applied for similar convenience. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22630) Restore TestReplicationDroppedTables coverage to branch-1
Andrew Purtell created HBASE-22630: -- Summary: Restore TestReplicationDroppedTables coverage to branch-1 Key: HBASE-22630 URL: https://issues.apache.org/jira/browse/HBASE-22630 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell TestReplicationDroppedTables was dropped from branch-1. Restore the test coverage with a test that is not flaky. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22629) Remove TestReplicationDroppedTables from branch-1
Andrew Purtell created HBASE-22629: -- Summary: Remove TestReplicationDroppedTables from branch-1 Key: HBASE-22629 URL: https://issues.apache.org/jira/browse/HBASE-22629 Project: HBase Issue Type: Bug Affects Versions: 1.5.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 TestReplicationDroppedTables has been flaky from initial commit and is outright broken on recent branch-1 now. This test was contributed by us but we dropped this test from our internal fork a while back. Do the same in open source. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22627) Port HBASE-22617 (Recovered WAL directories not getting cleaned up) to branch-1
Andrew Purtell created HBASE-22627: -- Summary: Port HBASE-22617 (Recovered WAL directories not getting cleaned up) to branch-1 Key: HBASE-22627 URL: https://issues.apache.org/jira/browse/HBASE-22627 Project: HBase Issue Type: Sub-task Affects Versions: 1.3.5, 1.4.10, 1.5.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0, 1.3.6, 1.4.11 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-22617) Recovered WAL directories not getting cleaned up
[ https://issues.apache.org/jira/browse/HBASE-22617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22617. Resolution: Fixed Hadoop Flags: Reviewed > Recovered WAL directories not getting cleaned up > > > Key: HBASE-22617 > URL: https://issues.apache.org/jira/browse/HBASE-22617 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 1.5.0 >Reporter: Abhishek Singh Chouhan >Assignee: Duo Zhang >Priority: Blocker > Fix For: 3.0.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6 > > > While colocating the recovered edits directory with hbase.wal.dir, > BASE_NAMESPACE_DIR got missed. This results in recovered edits being put in a > separate directory rather than the default region directory even if the > hbase.wal.dir is not overridden. Eg. if data is stored in > /hbase/data/namespace/table1, recovered edits are put in > /hbase/namespace/table1. This also messes up the regular cleaner chores which > never operate on this new directory and these directories will never be > deleted, even for split parents or dropped tables. We should change the > default back to have the base namespace directory in path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-22616) responseTooXXX logging for Multi should characterize the component ops
[ https://issues.apache.org/jira/browse/HBASE-22616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22616. Resolution: Fixed Hadoop Flags: Reviewed > responseTooXXX logging for Multi should characterize the component ops > -- > > Key: HBASE-22616 > URL: https://issues.apache.org/jira/browse/HBASE-22616 > Project: HBase > Issue Type: Improvement >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11 > > Attachments: HBASE-22616-branch-1.patch, HBASE-22616.patch > > > Multi RPC can be a mix of gets and mutations. The responseTooXXX logging for > Multi ops should characterize the operations within the request so we have > some clue about whether read or write dispatch was involved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22616) responseTooXXX logging for Multi should characterize the component ops
Andrew Purtell created HBASE-22616: -- Summary: responseTooXXX logging for Multi should characterize the component ops Key: HBASE-22616 URL: https://issues.apache.org/jira/browse/HBASE-22616 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11 Multi RPC can be a mix of gets and mutations. The responseTooXXX logging for Multi ops should characterize the operations within the request so we have some clue about whether read or write dispatch was involved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-22574) hbase-filesystem does not build against HBase 1
[ https://issues.apache.org/jira/browse/HBASE-22574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22574. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: hbase-filesystem-1.0.0-alpha2 I created the new fix version hbase-filesystem-1.0.0-alpha2 for this change. Hope that was ok > hbase-filesystem does not build against HBase 1 > --- > > Key: HBASE-22574 > URL: https://issues.apache.org/jira/browse/HBASE-22574 > Project: HBase > Issue Type: Bug > Components: Filesystem Integration >Affects Versions: hbase-filesystem-1.0.0-alpha1 >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Fix For: hbase-filesystem-1.0.0-alpha2 > > Attachments: > 0001-HBASE-22574-hbase-filesystem-does-not-build-against-.patch, > HBASE-22574-sean.patch, HBASE-22574.patch, HBASE-22574.patch, > HBASE-22574.patch > > > hbase-filesystem does not build against HBase 1 because HBase 1 does not > provide a hbase-zookeeper module, which is a required dependency. This could > be moved into a version specific build profile. > $ mvn clean install package -Dhbase.version=1.4.10 -Dhadoop.version=2.9.2 > ... > [ERROR] Failed to execute goal on project hbase-oss: > Could not resolve dependencies for project > org.apache.hbase.filesystem:hbase-oss:jar:1.0.0-alpha1: > The following artifacts could not be resolved: > org.apache.hbase:hbase-zookeeper:jar:1.4.10, > org.apache.hbase:hbase-zookeeper:jar:tests:1.4.10: > Could not find artifact org.apache.hbase:hbase-zookeeper:jar:1.4.10 in > central (https://repo.maven.apache.org/maven2) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22574) hbase-filesystem does not build against HBase 1
Andrew Purtell created HBASE-22574: -- Summary: hbase-filesystem does not build against HBase 1 Key: HBASE-22574 URL: https://issues.apache.org/jira/browse/HBASE-22574 Project: HBase Issue Type: Bug Reporter: Andrew Purtell hbase-filesystem does not build against HBase 1 because HBase 1 does not provide a hbase-zookeeper module, which is a required dependency. This could be moved into a version specific build profile. $ mvn clean install package -Dhbase.version=1.4.10 -Dhadoop.version=2.9.2 ... [ERROR] Failed to execute goal on project hbase-oss: Could not resolve dependencies for project org.apache.hbase.filesystem:hbase-oss:jar:1.0.0-alpha1: The following artifacts could not be resolved: org.apache.hbase:hbase-zookeeper:jar:1.4.10, org.apache.hbase:hbase-zookeeper:jar:tests:1.4.10: Could not find artifact org.apache.hbase:hbase-zookeeper:jar:1.4.10 in central (https://repo.maven.apache.org/maven2) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21301) Heatmap for key access patterns
[ https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21301. Resolution: Later Fix Version/s: (was: 3.0.0) > Heatmap for key access patterns > --- > > Key: HBASE-21301 > URL: https://issues.apache.org/jira/browse/HBASE-21301 > Project: HBase > Issue Type: Improvement >Reporter: Archana Katiyar >Priority: Major > Attachments: HBASE-21301.v0.master.patch > > > Google recently released a beta feature for Cloud Bigtable which presents a > heat map of the keyspace. *Given how hotspotting comes up now and again here, > this is a good idea for giving HBase ops a tool to be proactive about it.* > >>> > Additionally, we are announcing the beta version of Key Visualizer, a > visualization tool for Cloud Bigtable key access patterns. Key Visualizer > helps debug performance issues due to unbalanced access patterns across the > key space, or single rows that are too large or receiving too much read or > write activity. With Key Visualizer, you get a heat map visualization of > access patterns over time, along with the ability to zoom into specific key > or time ranges, or select a specific row to find the full row key ID that's > responsible for a hotspot. Key Visualizer is automatically enabled for Cloud > Bigtable clusters with sufficient data or activity, and does not affect Cloud > Bigtable cluster performance. > <<< > From > [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html] > (Copied this description from the write-up by [~apurtell], thanks Andrew.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21301) Heatmap for key access patterns
[ https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-21301: > Heatmap for key access patterns > --- > > Key: HBASE-21301 > URL: https://issues.apache.org/jira/browse/HBASE-21301 > Project: HBase > Issue Type: Improvement >Reporter: Archana Katiyar >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21301.v0.master.patch > > > Google recently released a beta feature for Cloud Bigtable which presents a > heat map of the keyspace. *Given how hotspotting comes up now and again here, > this is a good idea for giving HBase ops a tool to be proactive about it.* > >>> > Additionally, we are announcing the beta version of Key Visualizer, a > visualization tool for Cloud Bigtable key access patterns. Key Visualizer > helps debug performance issues due to unbalanced access patterns across the > key space, or single rows that are too large or receiving too much read or > write activity. With Key Visualizer, you get a heat map visualization of > access patterns over time, along with the ability to zoom into specific key > or time ranges, or select a specific row to find the full row key ID that's > responsible for a hotspot. Key Visualizer is automatically enabled for Cloud > Bigtable clusters with sufficient data or activity, and does not affect Cloud > Bigtable cluster performance. > <<< > From > [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html] > (Copied this description from the write-up by [~apurtell], thanks Andrew.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-22521) TestOfflineMetaRebuildBase#testHMasterStartupOnMetaRebuild failing (branch-1)
[ https://issues.apache.org/jira/browse/HBASE-22521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22521. Resolution: Not A Problem Resolving as not a problem now that base commit has been reverted. > TestOfflineMetaRebuildBase#testHMasterStartupOnMetaRebuild failing (branch-1) > - > > Key: HBASE-22521 > URL: https://issues.apache.org/jira/browse/HBASE-22521 > Project: HBase > Issue Type: Bug >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Attachments: HBASE-22521-branch-1.patch > > > Failure is > {noformat} > [ERROR] > testHMasterStartupOnMetaRebuild(org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase) > Time elapsed: 37.48 s <<< FAILURE! > java.lang.AssertionError: expected:<[]> but was:<[NOT_DEPLOYED, NOT_DEPLOYED, > NOT_DEPLOYED, HOLE_IN_REGION_CHAIN, HOLE_IN_REGION_CHAIN, > HOLE_IN_REGION_CHAIN]> > at > org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase.validateMetaAndUserTableRows(TestOfflineMetaRebuildBase.java:159) > at > org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase.testHMasterStartupOnMetaRebuild(TestOfflineMetaRebuildBase.java:130) > {noformat} > {noformat} > TestOfflineMetaRebuildBase.testHMasterStartupOnMetaRebuild:130->validateMetaAndUserTableRows:159 > expected:<[]> but was:<[NOT_DEPLOYED, NOT_DEPLOYED, NOT_DEPLOYED, > HOLE_IN_REGION_CHAIN, HOLE_IN_REGION_CHAIN, HOLE_IN_REGION_CHAIN]> > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-22521) TestOfflineMetaRebuildBase#testHMasterStartupOnMetaRebuild failing (branch-1)
[ https://issues.apache.org/jira/browse/HBASE-22521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-22521: > TestOfflineMetaRebuildBase#testHMasterStartupOnMetaRebuild failing (branch-1) > - > > Key: HBASE-22521 > URL: https://issues.apache.org/jira/browse/HBASE-22521 > Project: HBase > Issue Type: Bug >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-22521-branch-1.patch > > > Failure is > {noformat} > [ERROR] > testHMasterStartupOnMetaRebuild(org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase) > Time elapsed: 37.48 s <<< FAILURE! > java.lang.AssertionError: expected:<[]> but was:<[NOT_DEPLOYED, NOT_DEPLOYED, > NOT_DEPLOYED, HOLE_IN_REGION_CHAIN, HOLE_IN_REGION_CHAIN, > HOLE_IN_REGION_CHAIN]> > at > org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase.validateMetaAndUserTableRows(TestOfflineMetaRebuildBase.java:159) > at > org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase.testHMasterStartupOnMetaRebuild(TestOfflineMetaRebuildBase.java:130) > {noformat} > {noformat} > TestOfflineMetaRebuildBase.testHMasterStartupOnMetaRebuild:130->validateMetaAndUserTableRows:159 > expected:<[]> but was:<[NOT_DEPLOYED, NOT_DEPLOYED, NOT_DEPLOYED, > HOLE_IN_REGION_CHAIN, HOLE_IN_REGION_CHAIN, HOLE_IN_REGION_CHAIN]> > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-16488) Starting namespace and quota services in master startup asynchronously
[ https://issues.apache.org/jira/browse/HBASE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-16488: I'm going to revert this commit. I keep running into test failures related to it: HBASE-22521, HBASE-22533, probably others. > Starting namespace and quota services in master startup asynchronously > -- > > Key: HBASE-16488 > URL: https://issues.apache.org/jira/browse/HBASE-16488 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 1.3.0, 1.0.3, 1.4.0, 1.1.5, 1.2.2, 2.0.0 >Reporter: Stephen Yuan Jiang >Assignee: Xu Cang >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-16488.branch-1.012.patch, > HBASE-16488.branch-1.012.patch, HBASE-16488.branch-1.013.patch, > HBASE-16488.branch-1.013.patch, HBASE-16488.branch-1.014.patch, > HBASE-16488.branch-1.015.patch, HBASE-16488.branch-1.016.patch, > HBASE-16488.branch-1.017.patch, HBASE-16488.revisit.v11-branch-1.patch, > HBASE-16488.v1-branch-1.patch, HBASE-16488.v1-master.patch, > HBASE-16488.v10-branch-1.patch, HBASE-16488.v2-branch-1.patch, > HBASE-16488.v2-branch-1.patch, HBASE-16488.v3-branch-1.patch, > HBASE-16488.v3-branch-1.patch, HBASE-16488.v4-branch-1.patch, > HBASE-16488.v5-branch-1.patch, HBASE-16488.v6-branch-1.patch, > HBASE-16488.v7-branch-1.patch, HBASE-16488.v8-branch-1.patch, > HBASE-16488.v9-branch-1.patch > > > From time to time, during internal IT test and from customer, we often see > master initialization failed due to namespace table region takes long time to > assign (eg. sometimes split log takes long time or hanging; or sometimes RS > is temporarily not available; sometimes due to some unknown assignment > issue). In the past, there was some proposal to improve this situation, eg. > HBASE-13556 / HBASE-14190 (Assign system tables ahead of user region > assignment) or HBASE-13557 (Special WAL handling for system tables) or > HBASE-14623 (Implement dedicated WAL for system tables). > This JIRA proposes another way to solve this master initialization fail > issue: namespace service is only used by a handful operations (eg. create > table / namespace DDL / get namespace API / some RS group DDL). Only quota > manager depends on it and quota management is off by default. Therefore, > namespace service is not really needed for master to be functional. So we > could start namespace service asynchronizely without blocking master startup. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22533) TestAccessController3 is flaky (branch-1)
Andrew Purtell created HBASE-22533: -- Summary: TestAccessController3 is flaky (branch-1) Key: HBASE-22533 URL: https://issues.apache.org/jira/browse/HBASE-22533 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 Wait for table create to complete before taking further action in security tests. Avoid a noisy NPE if test initialization fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22521) TestOfflineMetaRebuildBase#testHMasterStartupOnMetaRebuild failing (branch-1)
Andrew Purtell created HBASE-22521: -- Summary: TestOfflineMetaRebuildBase#testHMasterStartupOnMetaRebuild failing (branch-1) Key: HBASE-22521 URL: https://issues.apache.org/jira/browse/HBASE-22521 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 Failure is {noformat} [ERROR] testHMasterStartupOnMetaRebuild(org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase) Time elapsed: 37.48 s <<< FAILURE! java.lang.AssertionError: expected:<[]> but was:<[NOT_DEPLOYED, NOT_DEPLOYED, NOT_DEPLOYED, HOLE_IN_REGION_CHAIN, HOLE_IN_REGION_CHAIN, HOLE_IN_REGION_CHAIN]> at org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase.validateMetaAndUserTableRows(TestOfflineMetaRebuildBase.java:159) at org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase.testHMasterStartupOnMetaRebuild(TestOfflineMetaRebuildBase.java:130) {noformat} {noformat} TestOfflineMetaRebuildBase.testHMasterStartupOnMetaRebuild:130->validateMetaAndUserTableRows:159 expected:<[]> but was:<[NOT_DEPLOYED, NOT_DEPLOYED, NOT_DEPLOYED, HOLE_IN_REGION_CHAIN, HOLE_IN_REGION_CHAIN, HOLE_IN_REGION_CHAIN]> {noformat} but cause looks to be: {noformat} 2019-06-01 15:12:13,331 WARN [RS:0;10.0.0.14:59069] regionserver.HRegionServer(1068): Initialize abort timeout task failed java.lang.IllegalAccessException: Class org.apache.hadoop.hbase.regionserver.HRegionServer can not access a member of class org.apache.hadoop.hbase.regionserver.HRegionServer$SystemExitWhenAbortTimeout with modifiers "private" at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:110) at java.lang.reflect.AccessibleObject.slowCheckMemberAccess(AccessibleObject.java:262) at java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:254) at java.lang.reflect.Constructor.newInstance(Constructor.java:517) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1064) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:159) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:112) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:143) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1824) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:334) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:141) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22519) Shade new nimbus-jose-jwt and jcip dependencies
Andrew Purtell created HBASE-22519: -- Summary: Shade new nimbus-jose-jwt and jcip dependencies Key: HBASE-22519 URL: https://issues.apache.org/jira/browse/HBASE-22519 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 After moving up the default Hadoop version for the branch-1 build from 2.7 to 2.8, hbase-shaded-check-invariants step fails due to new unshaded Hadoop dependencies com.nimbusds:nimbus-jose-jwt: and com.github.stephenc.jcip:jcip-annotations -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22510) Address findbugs/spotbugs complaints (branch-1)
Andrew Purtell created HBASE-22510: -- Summary: Address findbugs/spotbugs complaints (branch-1) Key: HBASE-22510 URL: https://issues.apache.org/jira/browse/HBASE-22510 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.4.10 FindBugsmodule:hbase-common Inconsistent synchronization of org.apache.hadoop.hbase.io.encoding.EncodedDataBlock$BufferGrabbingByteArrayOutputStream.ourBytes; locked 50% of time Unsynchronized access at EncodedDataBlock.java:50% of time Unsynchronized access at EncodedDataBlock.java:[line 258] FindBugsmodule:hbase-hadoop2-compat java.util.concurrent.ScheduledThreadPoolExecutor stored into non-transient field MetricsExecutorImpl$ExecutorSingleton.scheduler At MetricsExecutorImpl.java:MetricsExecutorImpl$ExecutorSingleton.scheduler At MetricsExecutorImpl.java:[line 51] FindBugsmodule:hbase-server Possible null pointer dereference of r in org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(Collection) Dereferenced at HStore.java:r in org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(Collection) Dereferenced at HStore.java:[line 2840] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22509) Address findbugs/spotbugs complaints (branch-1.4)
Andrew Purtell created HBASE-22509: -- Summary: Address findbugs/spotbugs complaints (branch-1.4) Key: HBASE-22509 URL: https://issues.apache.org/jira/browse/HBASE-22509 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.3.5 FindBugsmodule:hbase-hadoop2-compat java.util.concurrent.ScheduledThreadPoolExecutor stored into non-transient field MetricsExecutorImpl$ExecutorSingleton.scheduler At MetricsExecutorImpl.java:MetricsExecutorImpl$ExecutorSingleton.scheduler At MetricsExecutorImpl.java:[line 51] FindBugsmodule:hbase-server Possible null pointer dereference of walKey in org.apache.hadoop.hbase.regionserver.HRegion.append(Append, long, long) Dereferenced at HRegion.java:walKey in org.apache.hadoop.hbase.regionserver.HRegion.append(Append, long, long) Dereferenced at HRegion.java:[line 7815] Possible null pointer dereference of walKey in org.apache.hadoop.hbase.regionserver.HRegion.doIncrement(Increment, long, long) Dereferenced at HRegion.java:walKey in org.apache.hadoop.hbase.regionserver.HRegion.doIncrement(Increment, long, long) Dereferenced at HRegion.java:[line 8052] Possible null pointer dereference of r in org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(Collection) Dereferenced at HStore.java:r in org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(Collection) Dereferenced at HStore.java:[line 2810] org.apache.hadoop.hbase.tool.Canary$RegionMonitor.run() makes inefficient use of keySet iterator instead of entrySet iterator At Canary.java:keySet iterator instead of entrySet iterator At Canary.java:[line 1095] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22508) Address findbugs/spotbugs complaints (branch-1.3)
Andrew Purtell created HBASE-22508: -- Summary: Address findbugs/spotbugs complaints (branch-1.3) Key: HBASE-22508 URL: https://issues.apache.org/jira/browse/HBASE-22508 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.3.5 FindBugsmodule:hbase-hadoop2-compat java.util.concurrent.ScheduledThreadPoolExecutor stored into non-transient field MetricsExecutorImpl$ExecutorSingleton.scheduler At MetricsExecutorImpl.java:MetricsExecutorImpl$ExecutorSingleton.scheduler At MetricsExecutorImpl.java:[line 51] FindBugsmodule:hbase-server Possible null pointer dereference of walKey in org.apache.hadoop.hbase.regionserver.HRegion.append(Append, long, long) Dereferenced at HRegion.java:walKey in org.apache.hadoop.hbase.regionserver.HRegion.append(Append, long, long) Dereferenced at HRegion.java:[line 7815] Possible null pointer dereference of walKey in org.apache.hadoop.hbase.regionserver.HRegion.doIncrement(Increment, long, long) Dereferenced at HRegion.java:walKey in org.apache.hadoop.hbase.regionserver.HRegion.doIncrement(Increment, long, long) Dereferenced at HRegion.java:[line 8052] Possible null pointer dereference of r in org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(Collection) Dereferenced at HStore.java:r in org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(Collection) Dereferenced at HStore.java:[line 2810] org.apache.hadoop.hbase.tool.Canary$RegionMonitor.run() makes inefficient use of keySet iterator instead of entrySet iterator At Canary.java:keySet iterator instead of entrySet iterator At Canary.java:[line 1095] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22460) Reopen a region if store reader references may have leaked
Andrew Purtell created HBASE-22460: -- Summary: Reopen a region if store reader references may have leaked Key: HBASE-22460 URL: https://issues.apache.org/jira/browse/HBASE-22460 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22459) Expose store reader reference count
Andrew Purtell created HBASE-22459: -- Summary: Expose store reader reference count Key: HBASE-22459 URL: https://issues.apache.org/jira/browse/HBASE-22459 Project: HBase Issue Type: Improvement Components: HFile, metrics, regionserver Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 3.0.0, 1.5.0, 2.3.0 Expose the reference count over a region's store file readers as a metric in region metrics and also as a new field in RegionLoad. This will make visible the reader reference count over all stores in the region to both metrics capture and anything that consumes ClusterStatus, like the shell's status command and the master UI. Coprocessors that wrap scanners might leak them, which will leak readers. We log when this happens but in order to notice the increasing trend of reference counts you have to scrape log output. It would be better if this information is also available as a metric. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22451) TestLoadIncrementalHFiles and TestSecureLoadIncrementalHFiles are flaky
Andrew Purtell created HBASE-22451: -- Summary: TestLoadIncrementalHFiles and TestSecureLoadIncrementalHFiles are flaky Key: HBASE-22451 URL: https://issues.apache.org/jira/browse/HBASE-22451 Project: HBase Issue Type: Bug Affects Versions: 1.5.0 Reporter: Andrew Purtell Fix For: 1.5.0 Attachments: log.txt TableNamespaceManager initialization is racy, leading to a test flake. Improve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22450) Port TestStoreScannerClosure from HBASE-22072
Andrew Purtell created HBASE-22450: -- Summary: Port TestStoreScannerClosure from HBASE-22072 Key: HBASE-22450 URL: https://issues.apache.org/jira/browse/HBASE-22450 Project: HBase Issue Type: Test Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 Port TestStoreScannerClosure from HBASE-22072. It should pass to prove the issue is not present on branch-1. Test currently passes on branch-1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22449) https everywhere in Maven metadata
Andrew Purtell created HBASE-22449: -- Summary: https everywhere in Maven metadata Key: HBASE-22449 URL: https://issues.apache.org/jira/browse/HBASE-22449 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell There will be some attention paid to insecure URLs used for retrieval of build dependencies. While our build does not have direct exposure to this we do have some insecure URLs pointing to secondary resources like JIRA, mailing list archives, hbase.apache.org, and the online book. Make these https too. I left the license header text alone, although there is a URL to the ASL 2 license embedded there. If we are going to update that, let's do that as a separate task because just about every file is going to be touched. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-22429) hbase-vote download step requires URL to end with '/'
[ https://issues.apache.org/jira/browse/HBASE-22429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22429. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.3.5 2.2.1 2.3.0 1.4.10 1.5.0 3.0.0 > hbase-vote download step requires URL to end with '/' > -- > > Key: HBASE-22429 > URL: https://issues.apache.org/jira/browse/HBASE-22429 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Tak Lon (Stephen) Wu >Priority: Trivial > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0, 2.2.1, 1.3.5 > > > The hbase-vote script's download step requires the sourcedir URL be > terminated with a path separator or else the retrieval will escape the > candidate's directory and mirror way too much. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22430) hbase-vote should tee build and test output to console
Andrew Purtell created HBASE-22430: -- Summary: hbase-vote should tee build and test output to console Key: HBASE-22430 URL: https://issues.apache.org/jira/browse/HBASE-22430 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell The hbase-vote script should tee the build and test output to console in addition to the output file so the user does not become suspicious about progress. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22429) hbase-vote download step requires URL to end with '/'
Andrew Purtell created HBASE-22429: -- Summary: hbase-vote download step requires URL to end with '/' Key: HBASE-22429 URL: https://issues.apache.org/jira/browse/HBASE-22429 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell The hbase-vote script's download step requires the sourcedir URL be terminated with a path component or else the retrieval will escape the candidate's directory and mirror way too much. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21048) Get LogLevel is not working from console in secure environment
[ https://issues.apache.org/jira/browse/HBASE-21048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21048. Resolution: Fixed Pushed addendum to branch-1.4 and branch-1. > Get LogLevel is not working from console in secure environment > -- > > Key: HBASE-21048 > URL: https://issues.apache.org/jira/browse/HBASE-21048 > Project: HBase > Issue Type: Bug >Reporter: Chandra Sekhar >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: security > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0 > > Attachments: HBASE-21048.001.patch, HBASE-21048.branch-1.001.patch, > HBASE-21048.branch-1.002.patch, HBASE-21048.branch-1.4.001.patch, > HBASE-21048.master.001.patch, HBASE-21048.master.002.patch, > HBASE-21048.master.003.patch, HBASE-21048.master.004.patch, > HBASE-21048.master.005.patch, HBASE-21048.master.006.patch, > HBASE-21048.master.007.patch, HBASE-21048.master.008.patch > > > When we try to get log level of specific package in secure environment, > getting SocketException. > {code:java} > hbase/master/bin# ./hbase org.apache.hadoop.hbase.http.log.LogLevel -getlevel > host-:16010 org.apache.hadoop.hbase > Connecting to http://host-:16010/logLevel?log=org.apache.hadoop.hbase > java.net.SocketException: Unexpected end of file from server > {code} > It is trying to connect http instead of https > code snippet that handling only http in *LogLevel.java* > {code:java} > public static void main(String[] args) { > if (args.length == 3 && "-getlevel".equals(args[0])) { > process("http://; + args[1] + "/logLevel?log=" + args[2]); > return; > } > else if (args.length == 4 && "-setlevel".equals(args[0])) { > process("http://; + args[1] + "/logLevel?log=" + args[2] > + "=" + args[3]); > return; > } > System.err.println(USAGES); > System.exit(-1); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-9950) Row level replication
[ https://issues.apache.org/jira/browse/HBASE-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-9950. --- Resolution: Not A Problem > Row level replication > - > > Key: HBASE-9950 > URL: https://issues.apache.org/jira/browse/HBASE-9950 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Ishan Chhabra >Priority: Minor > > We have a replication setup with the same table and column family being > present in multiple data centers. Currently, all of them have exactly the > same data, but each cluster doesn't need all the data. Rows need to be > present in only x out of the total y clusters. This information varies at the > row level and thus more granular replication cannot be achieved by setting up > cluster level replication. > Adding row level replication should solve this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21048) Get LogLevel is not working from console in secure environment
[ https://issues.apache.org/jira/browse/HBASE-21048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-21048: > Get LogLevel is not working from console in secure environment > -- > > Key: HBASE-21048 > URL: https://issues.apache.org/jira/browse/HBASE-21048 > Project: HBase > Issue Type: Bug >Reporter: Chandra Sekhar >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: security > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0 > > Attachments: HBASE-21048.001.patch, HBASE-21048.branch-1.001.patch, > HBASE-21048.branch-1.002.patch, HBASE-21048.master.001.patch, > HBASE-21048.master.002.patch, HBASE-21048.master.003.patch, > HBASE-21048.master.004.patch, HBASE-21048.master.005.patch, > HBASE-21048.master.006.patch, HBASE-21048.master.007.patch, > HBASE-21048.master.008.patch > > > When we try to get log level of specific package in secure environment, > getting SocketException. > {code:java} > hbase/master/bin# ./hbase org.apache.hadoop.hbase.http.log.LogLevel -getlevel > host-:16010 org.apache.hadoop.hbase > Connecting to http://host-:16010/logLevel?log=org.apache.hadoop.hbase > java.net.SocketException: Unexpected end of file from server > {code} > It is trying to connect http instead of https > code snippet that handling only http in *LogLevel.java* > {code:java} > public static void main(String[] args) { > if (args.length == 3 && "-getlevel".equals(args[0])) { > process("http://; + args[1] + "/logLevel?log=" + args[2]); > return; > } > else if (args.length == 4 && "-setlevel".equals(args[0])) { > process("http://; + args[1] + "/logLevel?log=" + args[2] > + "=" + args[3]); > return; > } > System.err.println(USAGES); > System.exit(-1); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21800) RegionServer aborted due to NPE from MetaTableMetrics coprocessor
[ https://issues.apache.org/jira/browse/HBASE-21800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21800. Resolution: Fixed > RegionServer aborted due to NPE from MetaTableMetrics coprocessor > - > > Key: HBASE-21800 > URL: https://issues.apache.org/jira/browse/HBASE-21800 > Project: HBase > Issue Type: Bug > Components: Coprocessors, meta, metrics, Operability >Reporter: Sakthi >Assignee: Sakthi >Priority: Critical > Labels: Meta > Fix For: 3.0.0, 1.5.0, 2.2.0, 2.3.0, 1.4.10 > > Attachments: hbase-21800.branch-1.001.patch, > hbase-21800.branch-1.002.patch, hbase-21800.branch-1.003.patch, > hbase-21800.branch-1.004.patch, hbase-21800.master.001.patch, > hbase-21800.master.002.patch, hbase-21800.master.003.patch > > > I was just playing around the code, trying to capture "Top k" table metrics > from MetaMetrics, when I bumped into this issue. Though currently we are not > capturing "Top K" table metrics, but we can encounter this issue because of > the "Top k Clients" that is implemented using the LossyAlgo. > > RegionServer gets aborted due to a NPE from MetaTableMetrics coprocessor. The > log looks somewhat like this: > {code:java} > 2019-01-28 23:31:10,311 ERROR > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > coprocessor.CoprocessorHost: The coprocessor > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > 2019-01-28 23:31:10,314 ERROR > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.HRegionServer: * ABORTING region server > 10.0.0.24,16020,1548747043814: The coprocessor > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw > java.lang.NullPointerException * > java.lang.NullPointerException > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547) > at >
[jira] [Created] (HBASE-22375) Promote AccessChecker to LimitedPrivate(Coprocessor)
Andrew Purtell created HBASE-22375: -- Summary: Promote AccessChecker to LimitedPrivate(Coprocessor) Key: HBASE-22375 URL: https://issues.apache.org/jira/browse/HBASE-22375 Project: HBase Issue Type: Task Components: Coprocessors, security Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 3.0.0, 1.5.0, 2.3.0, 2.1.5, 2.2.1, 1.3.5, 1.4.11 We refactored access checking so other components can reuse permissions checks formerly encapsulated by the AccessController coprocessor. The new API is AccessChecker. Currently it is marked Private but I propose to promote this to LimitedPrivate(Coprocessor) with Evolving status. Unless there is an objection I will make this change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22374) Backport AccessChecker refactor to branch-1.3
Andrew Purtell created HBASE-22374: -- Summary: Backport AccessChecker refactor to branch-1.3 Key: HBASE-22374 URL: https://issues.apache.org/jira/browse/HBASE-22374 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.3.5 We refactored access checking so other components can reuse permissions checks formerly encapsulated by the AccessController coprocessor. The new API is AccessChecker, committed as far back as branch-1.4. This should be backported to branch-1.3 as well so any potential user of AccessChecker can address changes and fixes for HBase versions 1.3 and up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-22215) Backport MultiRowRangeFilter does not work with reverse scans
[ https://issues.apache.org/jira/browse/HBASE-22215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-22215: > Backport MultiRowRangeFilter does not work with reverse scans > - > > Key: HBASE-22215 > URL: https://issues.apache.org/jira/browse/HBASE-22215 > Project: HBase > Issue Type: Sub-task > Components: Filters >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-22215.001.branch-1.patch, HBASE-22215.001.patch > > > See parent. Modify and apply to 1.x lines. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
Andrew Purtell created HBASE-22330: -- Summary: Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1 Key: HBASE-22330 URL: https://issues.apache.org/jira/browse/HBASE-22330 Project: HBase Issue Type: Sub-task Affects Versions: 1.3.4, 1.4.9, 1.5.0 Reporter: Andrew Purtell Fix For: 1.5.0 There appears to be a race condition between close and split which when combined with a side effect of HBASE-20704, leads to the parent region store files getting archived and cleared while daughter regions still have references to those parent region store files. Here is the timeline of events observed for an affected region: # RS1 faces ZooKeeper connectivity issue for master node and starts shutting itself down. As part of this it starts to close the store and clean up the compacted files (File A) # Master starts bulk assigning regions and assign parent region to RS2 # Region opens on RS2 and ends up opening compacted store file(s) (suspect this is due to HBASE-20724) # Now split happens and daughter regions open on RS2 and try to run a compaction as part of post open # Split request at this point is complete. However now archiving proceeds on RS1 and ends up archiving the store file that is referenced by the daughter. Compaction fails due to FileNotFoundException and all subsequent attempts to open the region will fail until manual resolution. We think having HBASE-20724 would help in such situations since we won't end up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22321) Add 1.5 release line to the Hadoop supported versions table
Andrew Purtell created HBASE-22321: -- Summary: Add 1.5 release line to the Hadoop supported versions table Key: HBASE-22321 URL: https://issues.apache.org/jira/browse/HBASE-22321 Project: HBase Issue Type: Task Components: documentation Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 Attachments: HBASE-22321.patch -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20026) Add 1.4 release line to the JDK and Hadoop expectation tables
[ https://issues.apache.org/jira/browse/HBASE-20026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20026. Resolution: Not A Problem Fix Version/s: (was: 1.4.10) There is a column in [https://hbase.apache.org/book.html#hadoop] for "HBase 1.4.x" and I don't know of any special considerations for JDK. Resolving this as Not A Problem. Please reopen (and update description) if I've misunderstood. > Add 1.4 release line to the JDK and Hadoop expectation tables > - > > Key: HBASE-20026 > URL: https://issues.apache.org/jira/browse/HBASE-20026 > Project: HBase > Issue Type: Task > Components: documentation >Affects Versions: 1.4.0 >Reporter: Sean Busbey >Assignee: Andrew Purtell >Priority: Critical > > the ref guide currently doesn't have any expectations listed for branch-1.4 > releases around JDK and Hadoop versions. > either add it, or maybe update the existing entries so we have "1.2, 1.3, > 1.4" in a single entry. unless we're ready to include something different > among them. (Maybe note the default Hadoop we ship with? Or Hadoop 2.8.2+ > moving to S maybe? if we've actually done any of the legwork.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-22270) master's jmx.clusterRequests could be negative in branch-1
[ https://issues.apache.org/jira/browse/HBASE-22270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22270. Resolution: Fixed Assignee: puleya7 Hadoop Flags: Reviewed Fix Version/s: 1.3.5 1.4.10 1.5.0 > master's jmx.clusterRequests could be negative in branch-1 > -- > > Key: HBASE-22270 > URL: https://issues.apache.org/jira/browse/HBASE-22270 > Project: HBase > Issue Type: Bug > Components: master, regionserver >Affects Versions: 1.4.9, 1.3.4, 1.2.12 >Reporter: puleya7 >Assignee: puleya7 >Priority: Major > Fix For: 1.5.0, 1.4.10, 1.3.5 > > > In 1.x branch, regionserver could report a negative (overflow) requestCount > to master, causing the master's jmx.clusterRequests to become smaller even > negative > HBASE-12444 fixed, but missed a little when backport to branch-1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22301) Consider rolling the WAL if the HDFS write pipeline is slow
Andrew Purtell created HBASE-22301: -- Summary: Consider rolling the WAL if the HDFS write pipeline is slow Key: HBASE-22301 URL: https://issues.apache.org/jira/browse/HBASE-22301 Project: HBase Issue Type: Improvement Components: wal Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 3.0.0, 1.5.0, 2.3.0 Consider the case when a subset of the HDFS fleet is unhealthy but suffering a gray failure not an outright outage. HDFS operations, notably syncs, are abnormally slow on pipelines which include this subset of hosts. If the regionserver's WAL is backed by an impacted pipeline, all WAL handlers can be consumed waiting for acks from the datanodes in the pipeline (recall that some of them are sick). Imagine a write heavy application distributing load uniformly over the cluster at a fairly high rate. With the WAL subsystem slowed by HDFS level issues, all handlers can be blocked waiting to append to the WAL. Once all handlers are blocked, the application will experience backpressure. This is with branch-1 code. I think branch-2's async WAL can mitigate but still can be susceptible. branch-2 sync WAL is susceptible. We already roll the WAL writer if the pipeline suffers the failure of a datanode and the replication factor on the pipeline is too low. We should also consider how much time it took for the write pipeline to complete a sync the last time we measured it, or the max over the interval from now to the last time we checked. If the sync time exceeds a configured threshold, roll the log writer then too. Fortunately we don't need to know which datanode is making the WAL write pipeline slow, only that syncs on the pipeline are too slow and exceeding a threshold. This is enough information to know when to roll it. Once we roll it, we will get three new randomly selected datanodes. On most clusters the probability the new pipeline includes the slow datanode will be low. (And if for some reason it does end up with a problematic datanode again, we roll again.) This is not a silver bullet but this can be a reasonably effective mitigation. Provide a metric for tracking when log roll is requested (and for what reason). Emit a log line at log roll time that includes datanode pipeline details for further debugging and analysis, similar to the existing slow FSHLog sync log line. If we roll too many times within a short interval of time this probably means there is a widespread problem with the fleet and so our mitigation is not helping and may be exacerbating those problems or operator difficulties. Ensure log roll requests triggered by this new feature happen infrequently enough to not cause difficulties under either normal or abnormal conditions. A very simple strategy that could work well under both normal and abnormal conditions is to define a fairly lengthy interval, default 5 minutes, and then insure we do not roll more than once during this interval for this reason. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21959) CompactionTool should close the store it uses for compacting files, in order to properly archive compacted files.
[ https://issues.apache.org/jira/browse/HBASE-21959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21959. Resolution: Fixed I reverted this change on branch-1 then pushed the change to the tool without the unit test. Suite executions look better after this change. I'll put back the original if it fails to pan out. > CompactionTool should close the store it uses for compacting files, in order > to properly archive compacted files. > - > > Key: HBASE-21959 > URL: https://issues.apache.org/jira/browse/HBASE-21959 > Project: HBase > Issue Type: Bug > Components: tooling >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0 > > Attachments: HBASE-21959-branch-1-001.patch, > HBASE-21959-branch-1-002.patch, HBASE-21959-branch-1.patch, > HBASE-21959-master-001.patch, HBASE-21959-master-002.patch, > HBASE-21959-master-003.patch > > > While using CompactionTool to offload RSes, noticed compacted files were > never archived from original region dir, causing the space used by the region > to actually double. Going through its compaction related code on HStore, > which is used by CompactionTool for performing compactions, found out what > that compacted files archiving happens mainly while closing the HStore > instance. CompactionTool is never explicitly closing its HStore instance, so > adding a simple patch that properly close the store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21959) CompactionTool should close the store it uses for compacting files, in order to properly archive compacted files.
[ https://issues.apache.org/jira/browse/HBASE-21959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-21959: Git bisecting recent unit test instability landed on this commit twice. At first glance it's hard to see what the problem could be except there is a new unit added here. Testing revert and recommit without the new unit looks promising. The failure was intermittent so it may take some time to confirm with enough iterations that removing the unit test here is a short term solution to unblock releasing from branch-1. If so I will push up the revert and amended commit to the branch-1s. > CompactionTool should close the store it uses for compacting files, in order > to properly archive compacted files. > - > > Key: HBASE-21959 > URL: https://issues.apache.org/jira/browse/HBASE-21959 > Project: HBase > Issue Type: Bug > Components: tooling >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0 > > Attachments: HBASE-21959-branch-1-001.patch, > HBASE-21959-branch-1-002.patch, HBASE-21959-master-001.patch, > HBASE-21959-master-002.patch, HBASE-21959-master-003.patch > > > While using CompactionTool to offload RSes, noticed compacted files were > never archived from original region dir, causing the space used by the region > to actually double. Going through its compaction related code on HStore, > which is used by CompactionTool for performing compactions, found out what > that compacted files archiving happens mainly while closing the HStore > instance. CompactionTool is never explicitly closing its HStore instance, so > adding a simple patch that properly close the store. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20586) SyncTable tool: Add support for cross-realm remote clusters
[ https://issues.apache.org/jira/browse/HBASE-20586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20586. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.2.1 2.1.5 1.4.10 > SyncTable tool: Add support for cross-realm remote clusters > --- > > Key: HBASE-20586 > URL: https://issues.apache.org/jira/browse/HBASE-20586 > Project: HBase > Issue Type: Improvement > Components: mapreduce, Operability, Replication >Affects Versions: 1.2.0, 2.0.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0, 2.1.5, 2.2.1 > > Attachments: HBASE-20586.master.001.patch > > > One possible scenario for HashTable/SyncTable is for synchronize different > clusters, for instance, when replication has been enabled but data existed > already, or due replication issues that may had caused long lags in the > replication. > For secured clusters under different kerberos realms (with cross-realm > properly set), though, current SyncTable version would fail to authenticate > with the remote cluster when trying to read HashTable outputs (when > *sourcehashdir* is remote) and also when trying to read table data on the > remote cluster (when *sourcezkcluster* is remote). > The hdfs error would look like this: > {noformat} > INFO mapreduce.Job: Task Id : attempt_1524358175778_105392_m_00_0, Status > : FAILED > Error: java.io.IOException: Failed on local exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]; Host Details : local host is: "local-host/1.1.1.1"; > destination host is: "remote-nn":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1506) > at org.apache.hadoop.ipc.Client.call(Client.java:1439) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:256) > ... > at > org.apache.hadoop.hbase.mapreduce.HashTable$TableHash.readPropertiesFile(HashTable.java:144) > at > org.apache.hadoop.hbase.mapreduce.HashTable$TableHash.read(HashTable.java:105) > at > org.apache.hadoop.hbase.mapreduce.SyncTable$SyncMapper.setup(SyncTable.java:188) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) > ... > Caused by: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]{noformat} > The above can be sorted if the SyncTable job acquires a DT for the remote NN. > Once hdfs related authentication is done, it's also necessary to authenticate > against remote HBase, as the below error would arise: > {noformat} > INFO mapreduce.Job: Task Id : attempt_1524358175778_172414_m_00_0, Status > : FAILED > Error: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get > the location > at > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:326) > ... > at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:867) > at > org.apache.hadoop.hbase.mapreduce.SyncTable$SyncMapper.syncRange(SyncTable.java:331) > ... > Caused by: java.io.IOException: Could not set up IO Streams to > remote-rs-host/1.1.1.2:60020 > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:786) > ... > Caused by: java.lang.RuntimeException: SASL authentication failed. The most > likely cause is missing or invalid credentials. Consider 'kinit'. > ... > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tgt) > ...{noformat} > The above would need additional authentication logic against the remote hbase > cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-12829) Request count in RegionLoad may not accurate to compute the load cost for region
[ https://issues.apache.org/jira/browse/HBASE-12829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-12829. Resolution: Not A Problem > Request count in RegionLoad may not accurate to compute the load cost for > region > > > Key: HBASE-12829 > URL: https://issues.apache.org/jira/browse/HBASE-12829 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 0.99.2 >Reporter: Jianwei Cui >Priority: Minor > > StochasticLoadBalancer#RequestCostFunction(ReadRequestCostFunction and > WriteRequestCostFunction) will compute load cost for a region based on a > number of remembered region loads. Each region load records the total count > for read/write request at reported time since it opened. However, the request > count will be reset if region moved, making the new reported count could not > represent the total request. For example, if a region has high write > throughput, the WrtieRequest in region load will be very big after onlined > for a long time, then if the region moved, the new WriteRequest will be much > smaller, making the region contributes much smaller to the cost of its > belonging rs. We may need to consider the region open time to get more > accurate region load. > As another way, how about using read/write request count at each time slots > instead of total request count? The total count will make older read/write > request throughput contribute more to the cost by > CostFromRegionLoadFunction#getRegionLoadCost: > {code} > protected double getRegionLoadCost(Collection regionLoadList) > { > double cost = 0; > for (RegionLoad rl : regionLoadList) { > double toAdd = getCostFromRl(rl); > if (cost == 0) { > cost = toAdd; > } else { > cost = (.5 * cost) + (.5 * toAdd); > } > } > return cost; > } > {code} > For example, assume the balancer now remembers three loads for a region at > time t1, t2, t3(t1 < t2 < t3), the write request is w1, w2, w3 respectively > for time slots [0, t1), [t1, t2), [t2, t3), so the WriteRequest in the region > load at t1, t2, t3 will be w1, w1 + w2, w1 + w2 + w3 and the WriteRequest > cost will be: > {code} > 0.5 * (w1 + w2 + w3) + 0.25 * (w1 + w2) + 0.25 * w1 = w1 + 0.75 * w2 + > 0.5 * w3 > {code} > The w1 contributes more to the cost than w2 and w3. However, intuitively, I > think the recent read/write throughput should represent the current load of > the region better than the older ones. Therefore, how about using w1, w2 and > w3 directly when computing? Then, the cost will become: > {code} > 0.25 * w1 + 0.25 * w2 + 0.5 * w3 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-8443) Queue a balancer run when regionservers report in for the first time
[ https://issues.apache.org/jira/browse/HBASE-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-8443. --- Resolution: Fixed > Queue a balancer run when regionservers report in for the first time > > > Key: HBASE-8443 > URL: https://issues.apache.org/jira/browse/HBASE-8443 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Elliott Clark >Priority: Major > > When running integration tests it's apparent that lots of region servers sit > for quite a while in between balancer runs. > I propose > * Queuing one balancer run that will run 30 seconds after a new region server > checks in. > * Reset the balancer period if we queue a balancer run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-10075) add a locality-aware balancer
[ https://issues.apache.org/jira/browse/HBASE-10075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-10075. Resolution: Fixed > add a locality-aware balancer > - > > Key: HBASE-10075 > URL: https://issues.apache.org/jira/browse/HBASE-10075 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 0.94.12 >Reporter: Chengxiang Li >Priority: Major > > basic idea: > during rebalance. For each region server, iterate regions, give each region a > balance score, remove the lowest one until the region number of region server > reach avg floor. > during assignment. match to-be-assigned regions with each active region > server as pairs, give each pair a balance score, the highest win this region. > here is the mark formula: > (1 – tableRegionNumberOnServer/allTableRegionNumber) * tableBalancerWeight > + (1 – regionNumberOnServer/allRegionNumber) * serverBalancerWeight + > regionBlockSizeOnServer/regionBlockSize * localityWeight > + (previousServer?1:0) * stickinessWeight > there are 4 factors which would influence the final balance score: > 1. region balance > 2. table region balance > 3. region locality > 4. region stickiness > through adjust the weight of these 4 factors, we can balance the cluster in > different strategy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-3268) Run balancer when a new node is added
[ https://issues.apache.org/jira/browse/HBASE-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-3268. --- Resolution: Incomplete > Run balancer when a new node is added > - > > Key: HBASE-3268 > URL: https://issues.apache.org/jira/browse/HBASE-3268 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Todd Lipcon >Priority: Major > > Right now we only balance the cluster once every 5 minutes by default. This > is likely to confuse new users. When you start a new region server, you > expect it to pick up some load very quickly, but right now you have to wait 5 > minutes for it to start doing anything in the worst case. > We could/should also add a button/shell command to "trigger balance now" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21758) Update hadoop-three.version on branch-1 to 3.0.3
[ https://issues.apache.org/jira/browse/HBASE-21758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21758. Resolution: Won't Fix Assignee: (was: Andrew Purtell) Fix Version/s: (was: 1.5.0) > Update hadoop-three.version on branch-1 to 3.0.3 > > > Key: HBASE-21758 > URL: https://issues.apache.org/jira/browse/HBASE-21758 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Priority: Trivial > Attachments: HBASE-21758-branch-1.patch > > > Sync the branch-1 POM with master and branch-2 with respect to the default > version of {{hadoop-three.version}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-17762) Add logging to HBaseAdmin for user initiated tasks
[ https://issues.apache.org/jira/browse/HBASE-17762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-17762. Resolution: Incomplete Assignee: (was: churro morales) Fix Version/s: (was: 1.5.0) (was: 3.0.0) > Add logging to HBaseAdmin for user initiated tasks > -- > > Key: HBASE-17762 > URL: https://issues.apache.org/jira/browse/HBASE-17762 > Project: HBase > Issue Type: Task >Reporter: churro morales >Priority: Major > Attachments: HBASE-17762.patch, HBASE-17762.v1.patch > > > Things like auditing a forced major compaction are really useful and right > now there is no logging when this is triggered. Other actions may require > logging as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-16594. Resolution: Incomplete Assignee: (was: binlijin) Fix Version/s: (was: 2.3.0) (was: 1.5.0) (was: 3.0.0) > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Improvement > Components: Performance >Reporter: binlijin >Priority: Major > Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-15677) FailedServerException shouldn't clear MetaCache
[ https://issues.apache.org/jira/browse/HBASE-15677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-15677. Resolution: Incomplete Assignee: (was: Mikhail Antonov) Fix Version/s: (was: 1.5.0) (was: 3.0.0) > FailedServerException shouldn't clear MetaCache > --- > > Key: HBASE-15677 > URL: https://issues.apache.org/jira/browse/HBASE-15677 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 1.3.0 >Reporter: Mikhail Antonov >Priority: Major > > Right now FailedServerException clears meta cache. Seems like it's > unnecessary (if we hit that, someone has already gotten some network/remote > error in the first place and invalidated located cache for us), and seems it > could lead to unnecessary drops, as FailedServers cache has default TTL of 2 > seconds, so we can encounter situation like this: > - thread T1 hit network error and cleared the cache, put server in failed > server list > - thread T2 tries to get it's request in and gets FailedServerException > - thread T1 does meta scan to populate the cache > - thread T2 clears the cache after it's got FSE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21013) Backport "read part" of HBASE-18754 to all active 1.x branches
[ https://issues.apache.org/jira/browse/HBASE-21013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21013. Resolution: Incomplete Assignee: (was: Mingdao Yang) > Backport "read part" of HBASE-18754 to all active 1.x branches > -- > > Key: HBASE-21013 > URL: https://issues.apache.org/jira/browse/HBASE-21013 > Project: HBase > Issue Type: Sub-task >Reporter: Chia-Ping Tsai >Priority: Critical > Attachments: HBASE-21013-branch-1.4.001.patch > > > The hfiles impacted by HBASE-18754 will have bytes of proto.TimeRangeTracker. > It makes all 1.x branches failed to read the hfile since all 1.x branches > can't deserialize the proto.TimeRangeTracker. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20912) Add import order config in dev support for eclipse
[ https://issues.apache.org/jira/browse/HBASE-20912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20912. Resolution: Fixed Fix Version/s: 2.2.1 2.1.5 1.2.12 1.5.1 2.3.0 1.3.4 1.4.10 3.0.0 > Add import order config in dev support for eclipse > -- > > Key: HBASE-20912 > URL: https://issues.apache.org/jira/browse/HBASE-20912 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 1.4.10, 1.3.4, 2.3.0, 1.5.1, 1.2.12, 2.1.5, 2.2.1 > > Attachments: eclipse.importorder > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-22170) Add an import order template under the dev-support directory
[ https://issues.apache.org/jira/browse/HBASE-22170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22170. Resolution: Duplicate > Add an import order template under the dev-support directory > > > Key: HBASE-22170 > URL: https://issues.apache.org/jira/browse/HBASE-22170 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang >Priority: Major > > So it does not confuse developers... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20911) correct Swtich/case indentation in formatter template for eclipse
[ https://issues.apache.org/jira/browse/HBASE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20911. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.2.1 1.2.12 1.5.1 2.3.0 1.3.4 1.4.10 3.0.0 > correct Swtich/case indentation in formatter template for eclipse > - > > Key: HBASE-20911 > URL: https://issues.apache.org/jira/browse/HBASE-20911 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 1.4.10, 1.3.4, 2.3.0, 1.5.1, 1.2.12, 2.2.1 > > Attachments: HBASE-20911.patch, HBASE-20911_v1.patch > > > Making it consistent with our checkstyle requirments. > {code} > > > ** > > > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22132) Backport HBASE-22115 intent to branch-1
Andrew Purtell created HBASE-22132: -- Summary: Backport HBASE-22115 intent to branch-1 Key: HBASE-22132 URL: https://issues.apache.org/jira/browse/HBASE-22132 Project: HBase Issue Type: Sub-task Affects Versions: 1.5.0 Reporter: Andrew Purtell Fix For: 1.5.1 Check the exposure of branch-1 code to the problems described on HBASE-22115 and apply the fix approach there. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22126) TestBlocksRead is flaky (branch-1)
Andrew Purtell created HBASE-22126: -- Summary: TestBlocksRead is flaky (branch-1) Key: HBASE-22126 URL: https://issues.apache.org/jira/browse/HBASE-22126 Project: HBase Issue Type: Bug Components: test Affects Versions: 1.5.0 Reporter: Andrew Purtell Fix For: 1.5.1 TestBlocksRead does not fail when invoked by itself but is flaky when run as part of the suite. Some kind of race during setup. [ERROR] testBlocksStoredWhenCachingDisabled(org.apache.hadoop.hbase.regionserver.TestBlocksRead) Time elapsed: 0.19 s <<< ERROR! java.net.ConnectException: Call From $HOST/$IP to localhost:59658 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hbase.regionserver.TestBlocksRead.initHRegion(TestBlocksRead.java:112) at org.apache.hadoop.hbase.regionserver.TestBlocksRead.testBlocksStoredWhenCachingDisabled(TestBlocksRead.java:389) Caused by: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.regionserver.TestBlocksRead.initHRegion(TestBlocksRead.java:112) at org.apache.hadoop.hbase.regionserver.TestBlocksRead.testBlocksStoredWhenCachingDisabled(TestBlocksRead.java:389) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-22125) Fix more instances in make_rc.sh where we need -Dhttps.protocols=TLSv1.2
[ https://issues.apache.org/jira/browse/HBASE-22125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-22125. Resolution: Fixed Fix Version/s: 1.2.12 1.3.4 1.4.10 Pushed simple build script fix > Fix more instances in make_rc.sh where we need -Dhttps.protocols=TLSv1.2 > > > Key: HBASE-22125 > URL: https://issues.apache.org/jira/browse/HBASE-22125 > Project: HBase > Issue Type: Bug >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Trivial > Fix For: 1.4.10, 1.3.4, 1.2.12, 1.5.0 > > Attachments: HBASE-22125-branch-1.patch > > > make_rc.sh on branch-1 is missing some places where we need to define the > system property https.protocols=TLSv1.2 in order for JDK 7 to succeed in > accessing Maven resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22125) Fix more instances in make_rc.sh where we need -Dhttps.protocols=TLSv1.2
Andrew Purtell created HBASE-22125: -- Summary: Fix more instances in make_rc.sh where we need -Dhttps.protocols=TLSv1.2 Key: HBASE-22125 URL: https://issues.apache.org/jira/browse/HBASE-22125 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 make_rc.sh on branch-1 is missing some places where we need to define the system property https.protocols=TLSv1.2 in order for JDK 7 to succeed in accessing Maven resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21135) Build fails on windows as it fails to parse windows path during license check
[ https://issues.apache.org/jira/browse/HBASE-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21135. Resolution: Fixed > Build fails on windows as it fails to parse windows path during license check > - > > Key: HBASE-21135 > URL: https://issues.apache.org/jira/browse/HBASE-21135 > Project: HBase > Issue Type: Bug > Components: build >Affects Versions: 3.0.0, 1.4.0, 1.3.2, 1.1.12, 1.2.7, 2.1.1 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: cygwin > Fix For: 3.0.0, 1.6.0, 1.4.10, 1.3.4, 2.3.0, 2.0.6, 1.2.12, > 2.1.5, 2.2.1, 1.5.0 > > Attachments: HBASE-21135-addendum.patch, HBASE-21135.master.001.patch > > > License check via enforce plugin throws following error during build on > windows: > {code:java} > Sourced file: inline evaluation of: ``File license = new > File("D:\DS\HBase_2\hbase\hbase-shaded\target/maven-shared-ar . . . '' Token > Parsing Error: Lexical error at line 1, column 29. Encountered: "D" (68), > after : "\"D:\\": {code} > Complete stacktrace with command > {code:java} > mvn clean install -DskipTests -X > {code} > is as follows: > {noformat} > [INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (check-aggregate-license) @ > hbase-shaded --- > [DEBUG] Configuring mojo > org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce from plugin > realm > ClassRealm[plugin>org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1, > parent: sun.misc.Launcher$AppClassLoader@55f96302] > [DEBUG] Configuring mojo > 'org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce' with basic > configurator --> > [DEBUG] (s) fail = true > [DEBUG] (s) failFast = false > [DEBUG] (f) ignoreCache = false > [DEBUG] (f) mojoExecution = > org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce {execution: > check-aggregate-license} > [DEBUG] (s) project = MavenProject: > org.apache.hbase:hbase-shaded:2.1.1-SNAPSHOT @ > D:\DS\HBase_2\hbase\hbase-shaded\pom.xml > [DEBUG] (s) condition = File license = new > File("D:\DS\HBase_2\hbase\hbase-shaded\target/maven-shared-archive-resources/META-INF/LICENSE"); > // Beanshell does not support try-with-resources, > // so we must close this scanner manually > Scanner scanner = new Scanner(license); > while (scanner.hasNextLine()) { > if (scanner.nextLine().startsWith("ERROR:")) { > scanner.close(); > return false; > } > } > scanner.close(); > return true; > [DEBUG] (s) message = License errors detected, for more detail find ERROR in > > D:\DS\HBase_2\hbase\hbase-shaded\target/maven-shared-archive-resources/META-INF/LICENSE > [DEBUG] (s) rules = > [org.apache.maven.plugins.enforcer.EvaluateBeanshell@7e307087] > [DEBUG] (s) session = org.apache.maven.execution.MavenSession@5e1218b4 > [DEBUG] (s) skip = false > [DEBUG] -- end configuration -- > [DEBUG] Executing rule: org.apache.maven.plugins.enforcer.EvaluateBeanshell > [DEBUG] Echo condition : File license = new > File("D:\DS\HBase_2\hbase\hbase-shaded\target/maven-shared-archive-resources/META-INF/LICENSE"); > // Beanshell does not support try-with-resources, > // so we must close this scanner manually > Scanner scanner = new Scanner(license); > while (scanner.hasNextLine()) { > if (scanner.nextLine().startsWith("ERROR:")) { > scanner.close(); > return false; > } > } > scanner.close(); > return true; > [DEBUG] Echo script : File license = new > File("D:\DS\HBase_2\hbase\hbase-shaded\target/maven-shared-archive-resources/META-INF/LICENSE"); > // Beanshell does not support try-with-resources, > // so we must close this scanner manually > Scanner scanner = new Scanner(license); > while (scanner.hasNextLine()) { > if (scanner.nextLine().startsWith("ERROR:")) { > scanner.close(); > return false; > } > } > scanner.close(); > return true; > [DEBUG] Adding failure due to exception > org.apache.maven.enforcer.rule.api.EnforcerRuleException: Couldn't evaluate > condition: File license = new >
[jira] [Reopened] (HBASE-21135) Build fails on windows as it fails to parse windows path during license check
[ https://issues.apache.org/jira/browse/HBASE-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-21135: This change has introduced a new Maven warning on branch-1 [WARNING] Some problems were encountered while building the effective model for org.apache.hbase:hbase-resource-bundle:jar:1.5.0 [WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found duplicate declaration of plugin org.codehaus.mojo:build-helper-maven-plugin @ org.apache.hbase:hbase:1.5.0, hbase/pom.xml, line 837, column 15 > Build fails on windows as it fails to parse windows path during license check > - > > Key: HBASE-21135 > URL: https://issues.apache.org/jira/browse/HBASE-21135 > Project: HBase > Issue Type: Bug > Components: build >Affects Versions: 3.0.0, 1.4.0, 1.3.2, 1.1.12, 1.2.7, 2.1.1 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: cygwin > Fix For: 3.0.0, 1.5.0, 1.6.0, 1.4.10, 1.3.4, 2.3.0, 2.0.6, > 1.2.12, 2.1.5, 2.2.1 > > Attachments: HBASE-21135.master.001.patch > > > License check via enforce plugin throws following error during build on > windows: > {code:java} > Sourced file: inline evaluation of: ``File license = new > File("D:\DS\HBase_2\hbase\hbase-shaded\target/maven-shared-ar . . . '' Token > Parsing Error: Lexical error at line 1, column 29. Encountered: "D" (68), > after : "\"D:\\": {code} > Complete stacktrace with command > {code:java} > mvn clean install -DskipTests -X > {code} > is as follows: > {noformat} > [INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (check-aggregate-license) @ > hbase-shaded --- > [DEBUG] Configuring mojo > org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce from plugin > realm > ClassRealm[plugin>org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1, > parent: sun.misc.Launcher$AppClassLoader@55f96302] > [DEBUG] Configuring mojo > 'org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce' with basic > configurator --> > [DEBUG] (s) fail = true > [DEBUG] (s) failFast = false > [DEBUG] (f) ignoreCache = false > [DEBUG] (f) mojoExecution = > org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce {execution: > check-aggregate-license} > [DEBUG] (s) project = MavenProject: > org.apache.hbase:hbase-shaded:2.1.1-SNAPSHOT @ > D:\DS\HBase_2\hbase\hbase-shaded\pom.xml > [DEBUG] (s) condition = File license = new > File("D:\DS\HBase_2\hbase\hbase-shaded\target/maven-shared-archive-resources/META-INF/LICENSE"); > // Beanshell does not support try-with-resources, > // so we must close this scanner manually > Scanner scanner = new Scanner(license); > while (scanner.hasNextLine()) { > if (scanner.nextLine().startsWith("ERROR:")) { > scanner.close(); > return false; > } > } > scanner.close(); > return true; > [DEBUG] (s) message = License errors detected, for more detail find ERROR in > > D:\DS\HBase_2\hbase\hbase-shaded\target/maven-shared-archive-resources/META-INF/LICENSE > [DEBUG] (s) rules = > [org.apache.maven.plugins.enforcer.EvaluateBeanshell@7e307087] > [DEBUG] (s) session = org.apache.maven.execution.MavenSession@5e1218b4 > [DEBUG] (s) skip = false > [DEBUG] -- end configuration -- > [DEBUG] Executing rule: org.apache.maven.plugins.enforcer.EvaluateBeanshell > [DEBUG] Echo condition : File license = new > File("D:\DS\HBase_2\hbase\hbase-shaded\target/maven-shared-archive-resources/META-INF/LICENSE"); > // Beanshell does not support try-with-resources, > // so we must close this scanner manually > Scanner scanner = new Scanner(license); > while (scanner.hasNextLine()) { > if (scanner.nextLine().startsWith("ERROR:")) { > scanner.close(); > return false; > } > } > scanner.close(); > return true; > [DEBUG] Echo script : File license = new > File("D:\DS\HBase_2\hbase\hbase-shaded\target/maven-shared-archive-resources/META-INF/LICENSE"); > // Beanshell does not support try-with-resources, > // so we must close this scanner manually > Scanner scanner = new Scanner(license); > while (scanner.hasNextLine()) { > if (scanner.nextLine().startsWith("ERROR:")) { > scanner.close();
[jira] [Created] (HBASE-22114) Port HBASE-15560 (TinyLFU-based BlockCache) to branch-1
Andrew Purtell created HBASE-22114: -- Summary: Port HBASE-15560 (TinyLFU-based BlockCache) to branch-1 Key: HBASE-22114 URL: https://issues.apache.org/jira/browse/HBASE-22114 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Fix For: 1.6.0 HBASE-15560 introduces the TinyLFU cache policy for the blockcache. W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the frequency in a counting sketch, ages periodically by halving the counters, and orders entries by SLRU. An entry is discarded by comparing the frequency of the new arrival (candidate) to the SLRU's victim, and keeping the one with the highest frequency. This allows the operations to be performed in O(1) time and, though the use of a compact sketch, a much larger history is retained beyond the current working set. In a variety of real world traces the policy had [near optimal hit rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. The implementation of HBASE-15560 uses several Java 8 idioms, depends on JRE 8+ type Optional, and has dependencies on libraries compiled with Java 8+ bytecode. It could be backported to branch-1 but must be made optional both at compile time and runtime, enabled by the 'build-with-jdk8' build profile. To this end the blockcache should be slightly modified to load L1 (and perhaps L2) implementation/policy dynamically at startup by reflection, with implementation classname specified in site configuration. This modification should be forward ported to maintain configuration sanity among the branches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21728) Backport to branch-2 (and maybe branch-1) HBASE-21684 Throw DNRIOE when connection or rpc client is closed
[ https://issues.apache.org/jira/browse/HBASE-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21728. Resolution: Fixed > Backport to branch-2 (and maybe branch-1) HBASE-21684 Throw DNRIOE when > connection or rpc client is closed > -- > > Key: HBASE-21728 > URL: https://issues.apache.org/jira/browse/HBASE-21728 > Project: HBase > Issue Type: Sub-task > Components: Client >Reporter: stack >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-21728-branch-1.001.patch > > > Backport the parent. May need a few changes. See suggestions in parent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-22045: I reverted this from branch-1.3 because the cherry pick of this commit broke the build. Please reapply and re-resolve. > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.4.10, 1.3.4, 2.3.0, 2.1.5, 2.2.1 > > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21805) Extend per table metrics with some RPC layer measures
[ https://issues.apache.org/jira/browse/HBASE-21805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21805. Resolution: Won't Fix > Extend per table metrics with some RPC layer measures > - > > Key: HBASE-21805 > URL: https://issues.apache.org/jira/browse/HBASE-21805 > Project: HBase > Issue Type: Improvement > Components: metrics, rpc >Reporter: Andrew Purtell >Priority: Minor > > RPC metrics are whole server in scope. We should extend the per-table metrics > to also track a subset of RPC metrics on a per table basis. This would give > better insight into the subjective experience of each use case. > Consider TotalCallTime, ProcessCallTime, QueueCallTime. Table metrics already > track request counts and server side processing latencies, just not queueing > effects at the RPC layer. > Ideally we avoid introducing another configuration option but this could be > made optional, if overheads are significant (measure them) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21926) Profiler servlet
Andrew Purtell created HBASE-21926: -- Summary: Profiler servlet Key: HBASE-21926 URL: https://issues.apache.org/jira/browse/HBASE-21926 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Fix For: 3.0.0, 1.6.0, 2.2.0 HIVE-20202 describes how Hive added a web endpoint for online in production profiling based on async-profiler. The endpoint was added as a servlet to httpserver and supports retrieval of flamegraphs compiled from the profiler trace. Async profiler ([https://github.com/jvm-profiling-tools/async-profiler] ) can also profile heap allocations, lock contention, and HW performance counters in addition to CPU. The profiling overhead is pretty low and is safe to run in production. The async-profiler project measured and describes CPU and memory overheads on these issues: [https://github.com/jvm-profiling-tools/async-profiler/issues/14] and [https://github.com/jvm-profiling-tools/async-profiler/issues/131] We have an httpserver based servlet stack so we can use HIVE-20202 as an implementation template for a similar feature for HBase daemons. Ideally we achieve these requirements: * Retrieve flamegraph SVG generated from latest profile trace. * Online enable and disable of profiling activity. (async-profiler does not do instrumentation based profiling so this should not cause the codgen related perf problems of that other approach and can be safely toggled on and off while under production load.) * CPU profiling. * ALLOCATION profiling. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21905) TestFIFOCompactionPolicy is flaky (branch-1)
Andrew Purtell created HBASE-21905: -- Summary: TestFIFOCompactionPolicy is flaky (branch-1) Key: HBASE-21905 URL: https://issues.apache.org/jira/browse/HBASE-21905 Project: HBase Issue Type: Test Affects Versions: 1.5.0 Reporter: Andrew Purtell Fix For: 1.5.1 java.lang.IllegalArgumentException , overlaps with For example: [ERROR] testFIFOCompactionPolicyExpiredEmptyHFiles(org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy) Time elapsed: 3.321 s <<< ERROR! java.io.IOException: java.io.IOException: [hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c0c5836c1f714f78847cf00326586b69, hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c65648691f614b2d8dd4b586c5923bfe] overlaps with [hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c0c5836c1f714f78847cf00326586b69] at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2438) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) Caused by: java.lang.IllegalArgumentException: [hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c0c5836c1f714f78847cf00326586b69, hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c65648691f614b2d8dd4b586c5923bfe] overlaps with [hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c0c5836c1f714f78847cf00326586b69] at com.google.common.base.Preconditions.checkArgument(Preconditions.java:119) at org.apache.hadoop.hbase.regionserver.HStore.addToCompactingFiles(HStore.java:1824) at org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1798) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.selectCompaction(CompactSplitThread.java:415) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompactionInternal(CompactSplitThread.java:388) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompactionInternal(CompactSplitThread.java:317) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:306) at org.apache.hadoop.hbase.regionserver.RSRpcServices.compactRegion(RSRpcServices.java:1513) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:23649) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380) ... 3 more -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21904) TestSimpleRpcScheduler is still flaky (branch-1)
Andrew Purtell created HBASE-21904: -- Summary: TestSimpleRpcScheduler is still flaky (branch-1) Key: HBASE-21904 URL: https://issues.apache.org/jira/browse/HBASE-21904 Project: HBase Issue Type: Test Affects Versions: 1.5.0 Reporter: Andrew Purtell Fix For: 1.5.1 Attachments: org.apache.hadoop.hbase.ipc.TestSimpleRpcScheduler-output.txt Flaky wait condition, unclear if it's the wait condition or the underlying functionality that is the problem. [ERROR] testSoftAndHardQueueLimits(org.apache.hadoop.hbase.ipc.TestSimpleRpcScheduler) Time elapsed: 0.228 s <<< FAILURE! java.lang.AssertionError at org.apache.hadoop.hbase.ipc.TestSimpleRpcScheduler.testSoftAndHardQueueLimits(TestSimpleRpcScheduler.java:380) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21855) Backport HBASE-21838 (Create a special ReplicationEndpoint just for verifying the WAL entries are fine) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-21855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21855. Resolution: Fixed Fix Version/s: (was: 1.5.0) Just picked back the original patch with minor fixups. > Backport HBASE-21838 (Create a special ReplicationEndpoint just for verifying > the WAL entries are fine) to branch-1 > --- > > Key: HBASE-21855 > URL: https://issues.apache.org/jira/browse/HBASE-21855 > Project: HBase > Issue Type: Test > Components: Replication, test >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > > HBASE-21838 is a good idea and I want to enable it during ITBLL testing of > branch-1, so make an equivalent for that branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21855) Backport HBASE-21838 (Create a special ReplicationEndpoint just for verifying the WAL entries are fine) to branch-1
Andrew Purtell created HBASE-21855: -- Summary: Backport HBASE-21838 (Create a special ReplicationEndpoint just for verifying the WAL entries are fine) to branch-1 Key: HBASE-21855 URL: https://issues.apache.org/jira/browse/HBASE-21855 Project: HBase Issue Type: Test Components: Replication, test Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 HBASE-21838 is a good idea and I want to enable it during ITBLL testing of branch-1, so make an equivalent for that branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21847) Fix test TestRegionServerMetrics#testRequestCount
[ https://issues.apache.org/jira/browse/HBASE-21847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21847. Resolution: Not A Problem Fix Version/s: (was: 1.5.0) I'm resolving this as not a problem because I'm just going to revert the breaking commit and have them retry application of the change over there. > Fix test TestRegionServerMetrics#testRequestCount > -- > > Key: HBASE-21847 > URL: https://issues.apache.org/jira/browse/HBASE-21847 > Project: HBase > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Minor > > This test is also in flaky test list: > [ERROR] TestRegionServerMetrics.testRequestCount:137 Metrics Counters > should be equal expected:<59> but was:<89> > The failutre is consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21775) The BufferedMutator doesn't ever refresh region location cache
[ https://issues.apache.org/jira/browse/HBASE-21775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-21775: This commit broke some tests on branch-1. See HBASE-21847 for one instance. Looks like there are others but I haven't bisected those failures yet. > The BufferedMutator doesn't ever refresh region location cache > -- > > Key: HBASE-21775 > URL: https://issues.apache.org/jira/browse/HBASE-21775 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5, 1.3.4 > > Attachments: HBASE-21775-ADDENDUM.master.001.patch, > HBASE-21775.master.001.patch, > org.apache.hadoop.hbase.client.TestAsyncProcess-with-HBASE-21775.txt, > org.apache.hadoop.hbase.client.TestAsyncProcess-without-HBASE-21775.txt > > > {color:#22}I noticed in some of my writing jobs that the BufferedMutator > would get stuck retrying writes against a dead server.{color} > {code:java} > 19/01/18 15:15:47 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:15:54 WARN [htable-pool3-t56] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=15/21, failureCount=1ops, last > exception=org.apache.hadoop.hbase.DoNotRetryIOException: Operation rpcTimeout > on ,17020,1547848193782, tracking started Fri Jan 18 14:55:37 PST > 2019; NOT retrying, failed=1 -- final attempt! > 19/01/18 15:15:54 ERROR [Executor task launch worker for task 0] > IngestRawData.map(): [B@258bc2c7: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: Operation rpcTimeout: 1 time, servers with issues: > ,17020,1547848193782 > {code} > > After the single remaining action permanently failed, it would resume > progress only to get stuck again retrying against the same dead server: > {code:java} > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:18 INFO [Executor task launch worker for task 0] > client.AsyncRequestFutureImpl: #2, waiting for 1 actions to finish on table: > dummy_table > 19/01/18 15:21:20 INFO [htable-pool3-t55] client.AsyncRequestFutureImpl: > id=2, table=dummy_table, attempt=6/21, failureCount=1ops, last > exception=java.net.ConnectException: Call to failed on connection > exception: > org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: > connection timed out: on ,17020,1547848193782, tracking > started null, retrying after=20089ms, operationsToReplay=1 > {code} > > Only restarting the client process to generate a new BufferedMutator instance > would fix the issue, at least until the next regionserver crash > The logs I've pasted show the issue happening with a > ConnectionTimeoutException, but we've also seen it with > NotServingRegionException and some others -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21831) Optional store-and-forward of simple mutations for regions in transition
Andrew Purtell created HBASE-21831: -- Summary: Optional store-and-forward of simple mutations for regions in transition Key: HBASE-21831 URL: https://issues.apache.org/jira/browse/HBASE-21831 Project: HBase Issue Type: New Feature Components: regionserver, rpc Reporter: Andrew Purtell Fix For: 3.0.0, 1.6.0, 2.3.0 We have an internal service built on Redis that is considering writing through to HBase directly for their persistence needs. Their current experience with Redis is * Average write latency is ~milliseconds * p999 write latencies with are "a few seconds" They want a similar experience when writing simple values directly to HBase. Infrequent exceptions to this would be acceptable. * Availability of 99.9% for writes * Expect most writes to be serviced within a few milliseconds, e.g. few millis at p95. Still evaluating what the requirement should be (~millis at p90 vs p95 vs p99). * Timeout of 2 seconds, should be rare There is a fallback plan considered if HBase cannot respond within 2 seconds. However this fallback cannot guarantee durability. Redis or the service's daemons may go down. They want HBase to provide required durability. Because this is a caching service, where all writes are expected to be served again from cache, at least for a while, if HBase were to accept writes such that they are not immediately visible, it could be fine that they are not visible for 10-20 minutes in the worst case. This is relatively easy to achieve as an engineering target should we consider offering a write option that does not guarantee immediate visibility. (A proposal follows below.) We are considering store-and-forward of simple mutations and perhaps also simple deletes, although the latter is not a hard requirement. Out of order processing of this subset of mutation requests is acceptable because their data model ensures all values are immutable. Presumably on the HBase side the timestamps of the requests would be set to the current server wall clock time when received, so eventually when applied all are available with correct temporal ordering (within the effective resolution of the server clocks). Deletes which are not immediately applied (or failed) could cause application level confusion, and although this would remain a concern for the general case, for this specific use case, stale reads could be explained to and tolerated by their users. The BigTable architecture assigns at most one server to serve a region at a time. Region Replicas are an enhancement to the base BigTable architecture we made in HBase which stands up two more read-only replicas for a given region, meaning a client attempting a read has the option to fail very quickly over from the primary to a replica for a (potentially stale) read, or distribute read load over all replicas, or employ a hedged reading strategy. Enabling region replicas and timeline consistency can lower the availability gap for reads in the high percentiles from ~minutes to ~milliseconds. However, this option will not help for write use cases wanting roughly the same thing, because there can be no fail-over for writes. Writes must still go to the active primary. When that region is in transition, writes must be held on the client until it is redeployed. Or, if region replicas are not enabled, when the sole region is in transition, again, writes must be held on the client until the region is available again. Regions enter the in-transition state for two reasons: failures, and housekeeping (splits and merges, or balancing). Time to region redeployment after failures depends on a number of factors, like how long it took for us to become aware of the failure, and how long it takes to split the write-ahead log of the failed server and distribute the recovered edits to the reopening region(s). We could in theory improve this behavior by being more predictive about declaring failure, like employing a phi accrual failure detector to signal to the master from clients that a regionserver is sick. Other time-to-recovery issues and mitigations are discussed in a number of JIRAs and blog posts and not discussed further here. Regarding housekeeping activities, splits and merges typically complete in under a second. However, split times up to ~30 seconds have been observed at my place of employ in rare conditions. In the instances I have investigated the cause is I/O stalls on the datanodes and metadata request stalls in the namenode, so not unexpected outlier cases. Mitigating these risks involve looking at split and policies. Split and merge policies are pluggable, and policy choices can be applied per table. In extreme cases, auto-splitting (and auto-merging) can be disabled on performance sensitive tables and accomplished through manual means during scheduled
[jira] [Created] (HBASE-21826) Rebase 1.5.0 CHANGES on branch-1.4 at release 1.4.9
Andrew Purtell created HBASE-21826: -- Summary: Rebase 1.5.0 CHANGES on branch-1.4 at release 1.4.9 Key: HBASE-21826 URL: https://issues.apache.org/jira/browse/HBASE-21826 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 Release 1.5.0 CHANGES.txt is based on last 1.4 release (1.4.9). Rebase fix versions accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21826) Rebase 1.5.0 CHANGES on branch-1.4 at release 1.4.9
[ https://issues.apache.org/jira/browse/HBASE-21826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21826. Resolution: Fixed > Rebase 1.5.0 CHANGES on branch-1.4 at release 1.4.9 > --- > > Key: HBASE-21826 > URL: https://issues.apache.org/jira/browse/HBASE-21826 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 1.5.0 > > > Release 1.5.0 CHANGES.txt is based on last 1.4 release (1.4.9). Rebase fix > versions accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)