from:"Gary Helmling"

[jira] [Created] (HBASE-7460) Cleanup client connection layers

2012-12-29 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-7460:


 Summary: Cleanup client connection layers
 Key: HBASE-7460
 URL: https://issues.apache.org/jira/browse/HBASE-7460
 Project: HBase
  Issue Type: Improvement
  Components: Client, IPC/RPC
Reporter: Gary Helmling


This issue originated from a discussion over in HBASE-7442.  We currently have 
a broken abstraction with {{HBaseClient}}, where it is bound to a single 
{{Configuration}} instance at time of construction, but then reused for all 
connections to all clusters.  This is combined with multiple, overlapping 
layers of connection caching.

Going through this code, it seems like we have a lot of mismatch between the 
higher layers and the lower layers, with too much abstraction in between. At 
the lower layers, most of the {{ClientCache}} stuff seems completely unused. We 
currently effectively have an {{HBaseClient}} singleton (for {{SecureClient}} 
as well in 0.92/0.94) in the client code, as I don't see anything that calls 
the constructor or {{RpcEngine.getProxy()}} versions with a non-default socket 
factory. So a lot of the code around this seems like built up waste.

The fact that a single Configuration is fixed in the {{HBaseClient}} seems like 
a broken abstraction as it currently stands. In addition to cluster ID, other 
configuration parameters (max retries, retry sleep) are fixed at time of 
construction. The more I look at the code, the more it looks like the 
{{ClientCache}} and sharing the {{HBaseClient}} instance is an unnecessary 
complication. Why cache the {{HBaseClient}} instances at all? In 
{{HConnectionManager}}, we already have a mapping from {{Configuration}} to 
{{HConnection}}. It seems to me like each {{HConnection(Implementation)}} 
instance should have it's own {{HBaseClient}} instance, doing away with the 
{{ClientCache}} mapping. This would keep each {{HBaseClient}} associated with a 
single cluster/configuration and fix the current breakage from reusing the same 
{{HBaseClient}} against different clusters.

We need a refactoring of some of the interactions of 
{{HConnection(Implementation)}}, {{HBaseRPC/RpcEngine}}, and {{HBaseClient}}. 
Off hand, we might want to expose a separate {{RpcEngine.getClient()}} method 
that returns a new {{RpcClient}} interface (implemented by {{HBaseClient}}) and 
move the {{RpcEngine.getProxy()}}/{{stopProxy()}} implementations into the 
client. So all proxy invocations can go through the same client, without 
requiring the static client cache. I haven't fully thought this through, so I 
could be missing other important aspects. But that approach at least seems like 
a step in the right direction for fixing the client abstractions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-9774) Provide a way for coprocessors to register and report custom metrics

2013-10-15 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-9774:


 Summary: Provide a way for coprocessors to register and report 
custom metrics
 Key: HBASE-9774
 URL: https://issues.apache.org/jira/browse/HBASE-9774
 Project: HBase
  Issue Type: New Feature
  Components: Coprocessors, metrics
Reporter: Gary Helmling


It would help provide better visibility into what coprocessors are doing if we 
provided a way for coprocessors to export their own metrics.  The general idea 
is to:

* extend access to the HBase metrics bus down into the coprocessor 
environments
* coprocessors can then register and increment custom metrics
* coprocessor metrics are then reported along with all others through normal 
mechanisms



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HBASE-9897) Clean up some security configuration checks in LoadIncrementalHFiles

2013-11-05 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-9897:


 Summary: Clean up some security configuration checks in 
LoadIncrementalHFiles
 Key: HBASE-9897
 URL: https://issues.apache.org/jira/browse/HBASE-9897
 Project: HBase
  Issue Type: Task
  Components: security
Reporter: Gary Helmling


In LoadIncrementalHFiles, use of SecureBulkLoadClient is conditioned on 
UserProvider.isHBaseSecurityEnabled() in a couple of places.  However, use of 
secure bulk loading seems to be required more by use of HDFS secure 
authentication, instead of HBase secure authentication.  It should be possible 
to use secure bulk loading, as long as SecureBulkLoadEndpoint is loaded, and 
HDFS secure authentication is enabled, regardless of the HBase authentication 
configuration.

In addition, SecureBulkLoadEndpoint does a direct check on permissions by 
referencing AccessController loaded on the same region, i.e.:
{code}
  getAccessController().prePrepareBulkLoad(env);
{code}

It seems like this will throw an NPE if AccessController is not configured.  We 
need an additional null check to handle this case gracefully.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (HBASE-9912) Need to delete a row based on partial rowkey in hbase ... Pls provide query for that

2013-11-06 Thread Gary Helmling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-9912.
--

Resolution: Invalid

This is a question, not a bug. Please email u...@hbase.apache.org with 
questions.  JIRA is for actual bug reports, improvements, etc.

See http://hbase.apache.org/mail-lists.html

 Need to delete a row based on partial rowkey in hbase ... Pls provide query 
 for that 
 -

 Key: HBASE-9912
 URL: https://issues.apache.org/jira/browse/HBASE-9912
 Project: HBase
  Issue Type: Bug
Reporter: ranjini
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (HBASE-10162) Add RegionObserver lifecycle hook to be called when region is available

2013-12-13 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-10162:
-

 Summary: Add RegionObserver lifecycle hook to be called when 
region is available
 Key: HBASE-10162
 URL: https://issues.apache.org/jira/browse/HBASE-10162
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Reporter: Gary Helmling


Over in HBASE-10161 and HBASE-10148, there is discussion of the need to modify 
existing coprocessors, which previously performed initialization only in 
postOpen(), in order to account for the new log replay mechanism happening post 
open.

This points out that we have a hole in coprocessor lifecycle management which 
caused the use of region lifecycle hooks (postOpen()) in the first place.

Instead of requiring coprocessor authors to hook into region lifecycle methods 
for initialization, we should provide an explicit implicit lifecycle hook for 
coprocessor authors to use when region open, log replay (and any future 
requirements) are complete, say initializeWhenAvailable() (open to better 
names).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HBASE-10721) Parallelize execution of multi operations on RegionServer

2014-03-11 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-10721:
-

 Summary: Parallelize execution of multi operations on RegionServer
 Key: HBASE-10721
 URL: https://issues.apache.org/jira/browse/HBASE-10721
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Gary Helmling


In the context of HBASE-10169, we're adding the ability to batch Coprocessor 
endpoint calls per regionserver, using the same batching that happens in the 
RegionServer.multi() calls.  However, execution of each of the calls will still 
happen serially on each RegionServer.  For Coprocessor endpoint calls, it might 
help to parallelize these, since each execution could be of indeterminate 
length.

Since it may help to parallelize the Coprocessor endpoint invocations, it 
raises the question of whether other operations handled in multi() calls should 
also be parallelized, or should we just rely on macro-scale parallelization 
through the RPC handler threads?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HBASE-3726) Allow coprocessor callback RPC calls to be batched at region server level

2014-03-14 Thread Gary Helmling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-3726.
--

Resolution: Duplicate

Closing as a duplicate of HBASE-10169

 Allow coprocessor callback RPC calls to be batched at region server level
 -

 Key: HBASE-3726
 URL: https://issues.apache.org/jira/browse/HBASE-3726
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors
Reporter: Ted Yu

 Cuurently the Callback.update() method is called for each Call.call() return 
 value obtained from each region.  Each Call.call() invocation is a separate 
 RPC, so there is currently one RPC per region. So there's no place at the 
 moment for the region server to be involved in any aggregation across regions.
 There is some preliminary support in 
 HConnectionManager.HConnectionImplementation.processBatch() that would allow 
 doing 1 RPC per region server, same as we do for multi-get and multi-put.
 We should provide ability to batch callback RPC calls.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HBASE-11292) Add an undelete operation

2014-06-03 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-11292:
-

 Summary: Add an undelete operation
 Key: HBASE-11292
 URL: https://issues.apache.org/jira/browse/HBASE-11292
 Project: HBase
  Issue Type: New Feature
  Components: Deletes
Reporter: Gary Helmling


While column families can be configured to keep deleted cells (allowing time 
range queries to still retrieve those cells), deletes are still somewhat unique 
in that they are irreversible operations.  Once a delete has been issued on a 
cell, the only way to undelete it is to rewrite the data with a timestamp 
newer than the delete.

The idea here is to add an undelete operation, that would make it possible to 
cancel a previous delete.  An undelete operation will be similar to a delete, 
in that it will be written as a marker (tombstone doesn't seem like the right 
word).  The undelete marker, however, will sort prior to a delete marker, 
canceling the effect of any following delete.

In the absence of a column family configured to KEEP_DELETED_CELLS, we can't be 
sure if a prior delete marker and the effected cells have already been garbage 
collected.  In this case (column family not configured with KEEP_DELETED_CELLS) 
it may be necessary for the server to reject undelete operations to avoid 
creating the appearance of a client contact for undeletes that can't reliably 
be honored.

I think there are additional subtleties of the implementation to be worked out, 
but I'm also interested in a broader discussion of interest in this capability.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HBASE-11653) RegionObserver coprocessor cannot override KeyValue values in prePut()

2014-08-01 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-11653:
-

 Summary: RegionObserver coprocessor cannot override KeyValue 
values in prePut()
 Key: HBASE-11653
 URL: https://issues.apache.org/jira/browse/HBASE-11653
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors
Affects Versions: 0.94.21
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Critical


Due to a bug in {{HRegion.internalPut()}}, any modifications that a 
{{RegionObserver}} makes to a Put's family map in the {{prePut()}} hook are 
lost.

This prevents coprocessors from modifying the values written by a {{Put}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HBASE-11800) Coprocessor service methods in HTableInterface should be annotated public

2014-08-21 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-11800:
-

 Summary: Coprocessor service methods in HTableInterface should be 
annotated public
 Key: HBASE-11800
 URL: https://issues.apache.org/jira/browse/HBASE-11800
 Project: HBase
  Issue Type: Task
  Components: Client
Affects Versions: 0.96.0, 0.98.0
Reporter: Gary Helmling


The {{HTableInterface.coprocessorService(...)}} and 
{{HTableInterface.batchCoprocessorService(...)}} methods were made private in 
HBASE-9529, when the coprocessor APIs were seen as unstable and evolving.

However, these methods represent a standard way for clients to use custom APIs 
exposed via coprocessors.  In that sense, they are targeted at general HBase 
users (who may run but not develop coprocessors), as opposed to coprocessor 
developers who want to extend HBase.

The coprocessor endpoint API has also remained much more stable than the 
coprocessor Observer interfaces, which tend to change along with HBase 
internals.  So there should not be much difficulty in supporting these methods 
as part of the public API.

I think we should drop the {{@InterfaceAudience.Private}} annotation on these 
methods and support them as part of the public {{HTableInterface}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HBASE-12578) Change TokenProvider to a SingletonCoprocessorService

2014-11-25 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-12578:
-

 Summary: Change TokenProvider to a SingletonCoprocessorService
 Key: HBASE-12578
 URL: https://issues.apache.org/jira/browse/HBASE-12578
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Gary Helmling


The {{TokenProvider}} coprocessor service, which is responsible for issuing 
HBase delegation tokens, currently runs a region endpoint.  In the security 
documentation, we recommend configuring this coprocessor for all table regions, 
however, we only ever address delegation token requests to the META region.

When {{TokenProvider}} was first added, region coprocessors were the only way 
of adding endpoints.  But, since then, we've added support for endpoints for 
regionserver and master coprocessors.  This makes loading {{TokenProvider}} on 
all table regions unnecessarily wasteful.

We can reduce the overhead for {{TokenProvider}} and greatly improve it's 
scalability by doing the following:
# Convert {{TokenProvider}} to a {{SingletonCoprocessorService}} that is 
configured to run on all regionservers.  This will ensure a single instance per 
regionserver instead of one per region.
# Direct delegation token requests to a random running regionserver so that we 
don't hotspot any single instance with requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12579) Move obtainAuthTokenForJob() methods out of User

2014-11-25 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-12579:
-

 Summary: Move obtainAuthTokenForJob() methods out of User
 Key: HBASE-12579
 URL: https://issues.apache.org/jira/browse/HBASE-12579
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Gary Helmling


The {{User}} class currently contains some utility methods to obtain HBase 
authentication tokens for the given user.  However, these methods initiate an 
RPC to the {{TokenProvider}} coprocessor endpoint, an action which should not 
be part of the User class' responsibilities.

This leads to a couple of problems:
# The way the methods are currently structured, it is impossible to integrate 
them with normal connection management for the cluster (the TokenUtil class 
constructs its own HTable instance internally).
# The User class is logically part of the hbase-common module, but uses the 
TokenUtil class (part of hbase-server, though it should probably be moved to 
hbase-client) through reflection, leading to a hidden dependency.

The {{obtainAuthTokenForJob()}} methods should be deprecated and the process of 
obtaining authentication tokens should be moved to use the normal connection 
lifecycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14821) CopyTable should allow overriding more config properties for peer cluster

2015-11-16 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-14821:
-

 Summary: CopyTable should allow overriding more config properties 
for peer cluster
 Key: HBASE-14821
 URL: https://issues.apache.org/jira/browse/HBASE-14821
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Gary Helmling
Assignee: Gary Helmling


When using CopyTable across two separate clusters, you can specify the ZK 
quorum for the destination cluster, but not much else in configuration 
overrides.  This can be a problem when the cluster configurations differ, such 
as when using security with different configurations for server principals.

We should provide a general way to override configuration properties for the 
peer / destination cluster.  One option would be to allow use of a prefix for 
command line properties ("peer.property.").  Properties matching this prefix 
will be stripped and merged to the peer configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14775) Replication can't authenticate with peer Zookeeper with different server principal

2015-11-05 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-14775:
-

 Summary: Replication can't authenticate with peer Zookeeper with 
different server principal
 Key: HBASE-14775
 URL: https://issues.apache.org/jira/browse/HBASE-14775
 Project: HBase
  Issue Type: Bug
Reporter: Gary Helmling
Assignee: Gary Helmling


When replication is setup with security, where the local ZK cluster and peer ZK 
cluster use different server principals, the source HBase cluster is unable to 
authenticate with the peer ZK cluster.

When ZK is configured for SASL authentication and a server principal other than 
the default ("zookeeper") is used, the correct server principal must be 
specified on the client as a system property -- the confusingly named 
{{zookeeper.sasl.client.username}}.  However, since this is given as a system 
property, authentication with the peer cluster breaks when it uses a different 
ZK server principal than the local cluster.

We need a way of tying this setting to the replication peer config and then 
setting the property when the peer's ZooKeeperWatcher is created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15025) Allow clients configured with insecure fallback to attempt SIMPLE auth when KRB fails

2015-12-21 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-15025:
-

 Summary: Allow clients configured with insecure fallback to 
attempt SIMPLE auth when KRB fails
 Key: HBASE-15025
 URL: https://issues.apache.org/jira/browse/HBASE-15025
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Gary Helmling
Assignee: Gary Helmling


We have separate configurations for both client and server allowing a 
"permissive" mode where connections to insecure servers and clients 
(respectively) are allowed.  However, if both client and server are configured 
for Kerberos authentication for a given cluster, and Kerberos authentication 
fails, the connection will still fail if the fallback configurations are set to 
true.

If the client is configured to allow insecure fallback, and Kerberos 
authentication fails, we could instead have the client retry with SIMPLE auth.  
If the server is also configured to allow insecure fallback, this would allow 
the connection to succeed in the case of transient problems with Kerberos 
infrastructure, for example.

There is of course a danger that this would allow misconfigurations of security 
to be silently ignored, but we can add some loud logging on the client side 
when fallback to SIMPLE auth occurs, plus we have metrics and logging on the 
server side for fallbacks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15038) ExportSnapshot should support separate configurations for source and destination clusters

2015-12-23 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-15038:
-

 Summary: ExportSnapshot should support separate configurations for 
source and destination clusters
 Key: HBASE-15038
 URL: https://issues.apache.org/jira/browse/HBASE-15038
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce, snapshots
Reporter: Gary Helmling
Assignee: Gary Helmling


Currently ExportSnapshot uses a single Configuration instance for both the 
source and destination FileSystem instances to use.  It should allow overriding 
properties for each filesystem connection separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14886) ReplicationAdmin does not use full peer configuration

2015-11-25 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-14886:
-

 Summary: ReplicationAdmin does not use full peer configuration
 Key: HBASE-14886
 URL: https://issues.apache.org/jira/browse/HBASE-14886
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Critical
 Fix For: 2.0.0, 1.2.0, 1.3.0


In {{listValidReplicationPeers()}}, we're creating the peer {{Configuration}} 
based on the source connection configuration and simply applying the peer ZK 
cluster key.  This causes any additional properties present in the 
{{ReplicationPeerConfig}} configuration to not be applied.

We should instead be using the configuration returned by 
{{ReplicationPeers.getPeerConf()}}, which we already call in that method.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14866) VerifyReplication should use peer configuration in peer connection

2015-11-20 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-14866:
-

 Summary: VerifyReplication should use peer configuration in peer 
connection
 Key: HBASE-14866
 URL: https://issues.apache.org/jira/browse/HBASE-14866
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: Gary Helmling
 Fix For: 2.0.0, 1.2.0, 1.3.0


VerifyReplication uses the replication peer's configuration to construct the 
ZooKeeper quorum address for the peer connection.  However, other configuration 
properties in the peer's configuration are dropped.  It should merge all 
configuration properties from the {{ReplicationPeerConfig}} when creating the 
peer connection and obtaining a credentials for the peer cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16025) Cache table state to reduce load on META

2016-06-14 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16025:
-

 Summary: Cache table state to reduce load on META
 Key: HBASE-16025
 URL: https://issues.apache.org/jira/browse/HBASE-16025
 Project: HBase
  Issue Type: Improvement
  Components: Client
Reporter: Gary Helmling
Priority: Critical
 Fix For: 2.0.0


HBASE-12035 moved keeping table enabled/disabled state from ZooKeeper into 
hbase:meta.  When we retry operations on the client, we check table state in 
order to return a specific message if the table is disabled.  This means that 
in master we will be going back to meta for every retry, even if a region's 
location has not changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16097) Flushes and compactions fail on getting split point

2016-06-23 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16097:
-

 Summary: Flushes and compactions fail on getting split point
 Key: HBASE-16097
 URL: https://issues.apache.org/jira/browse/HBASE-16097
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 1.2.1
Reporter: Gary Helmling
Assignee: Gary Helmling


We've seen a number of cases where flushes and compactions run, completely 
through, then throw an IndexOutOfBoundsException when getting the split point 
when checking if a split is needed.

For flushes, the stack trace looks something like:
{noformat}
ERROR regionserver.MemStoreFlusher: Cache flusher failed for entry [flush 
region ]
java.lang.IndexOutOfBoundsException: 131148
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at 
org.apache.hadoop.hbase.util.ByteBufferUtils.toBytes(ByteBufferUtils.java:491)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:351)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:520)
at 
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1510)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:726)
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:127)
at 
org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:2036)
at 
org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:82)
at 
org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7885)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:513)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
at java.lang.Thread.run(Thread.java:745)
{noformat}

For compactions, the exception occurs in the same spot:
{noformat}
ERROR regionserver.CompactSplitThread: Compaction failed Request = 
regionName=X, storeName=X, fileCount=XX, fileSize=XXX M, priority=1, time=
java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:540)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at 
org.apache.hadoop.hbase.util.ByteBufferUtils.toBytes(ByteBufferUtils.java:491)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:351)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:520)
at 
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1510)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:726)
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:127)
at 
org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:2036)
at 
org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:82)
at 
org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7885)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestSplit(CompactSplitThread.java:241)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:540)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:566)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

This continues until a compaction runs through and rewrites whatever file is 
causing the problem, at which point a split can proceed successfully.

While compactions and flushes are successfully completing up until this point 
(it occurs after new store files have been moved into place), the exception 
thrown on flush causes us to exit prior to checking if a compaction is needed.  
So normal compactions wind up not being triggered and the effected regions 
accumulate a large number of store files.

No root cause yet, so I'm parking this info here for investigation.  Seems like 
we're either mis-writing part of the index or making some bad assumptions on 
the index blocks that we've read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15111) "hbase version" should write to stdout

2016-01-14 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-15111:
-

 Summary: "hbase version" should write to stdout
 Key: HBASE-15111
 URL: https://issues.apache.org/jira/browse/HBASE-15111
 Project: HBase
  Issue Type: Improvement
  Components: util
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Trivial


Calling {{hbase version}} currently outputs the version info by writing to 
{{LOG.info}}.  This means, if you change the default log level settings, you 
may get no output at all on the command line.

Since {{VersionInfo.main()}} is being called, it should really just output 
straight to stdout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15234) ReplicationLogCleaner can abort due to transient ZK issues

2016-02-08 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-15234:
-

 Summary: ReplicationLogCleaner can abort due to transient ZK issues
 Key: HBASE-15234
 URL: https://issues.apache.org/jira/browse/HBASE-15234
 Project: HBase
  Issue Type: Bug
  Components: master, Replication
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Critical


The ReplicationLogCleaner delegate for the LogCleaner chore can abort due to 
transient errors reading the replication znodes, leaving the log cleaner chore 
stopped, but the master still running.  This causes logs to build up in the 
oldWALs directory, which can even hit storage or file count limits in HDFS, 
causing problems.

We've seen this happen in a couple of clusters when a rolling restart was 
performed on the zk peers (only one restarted at a time).

The full stack trace when the log cleaner aborts is:
{noformat}
16/02/02 15:22:39 WARN zookeeper.ZKUtil: 
replicationLogCleaner-0x1522c8b93c2fbae, quorum=, 
baseZNode=/hbase Unable to get data of znode /hbase/replication/rs  
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/replication/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:713)  
at 
org.apache.hadoop.hbase.replication.ReplicationQueuesClientZKImpl.getQueuesZNodeCversion(ReplicationQueuesClientZKImpl.java:80)
  
at 
org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner.loadWALsFromQueues(ReplicationLogCleaner.java:99)
  
at 
org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner.getDeletableFiles(ReplicationLogCleaner.java:70)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:233)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:157)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:185)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
  
at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:110)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)  
16/02/02 15:22:39 ERROR zookeeper.ZooKeeperWatcher: 
replicationLogCleaner-0x1522c8b93c2fbae, quorum=, 
baseZNode=/hbase Received unexpected KeeperException, re-throwing exception  
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/replication/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:713)  
at 
org.apache.hadoop.hbase.replication.ReplicationQueuesClientZKImpl.getQueuesZNodeCversion(ReplicationQueuesClientZKImpl.java:80)
  
at 
org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner.loadWALsFromQueues(ReplicationLogCleaner.java:99)
  
at 
org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner.getDeletableFiles(ReplicationLogCleaner.java:70)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:233)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:157)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:185)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset

[jira] [Created] (HBASE-15363) Add client side metrics for SASL connection failures

2016-02-29 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-15363:
-

 Summary: Add client side metrics for SASL connection failures
 Key: HBASE-15363
 URL: https://issues.apache.org/jira/browse/HBASE-15363
 Project: HBase
  Issue Type: Improvement
  Components: Client, metrics, security
Reporter: Gary Helmling
Assignee: Gary Helmling


There are a number of cases where we can get SASL connection failures before 
getting to the server, like errors talking to the KDC/TGS and misconfiguration 
of kerberos principals.  Hence these will not show up in the server-side 
authentication_failures metric.

We should add client side metrics on SASL connection failures to capture these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15294) Document advanced replication configurations with security

2016-02-19 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-15294:
-

 Summary: Document advanced replication configurations with security
 Key: HBASE-15294
 URL: https://issues.apache.org/jira/browse/HBASE-15294
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Gary Helmling
Assignee: Gary Helmling


HBASE-14866 fixed handling of source and cluster replication configs for some 
replication tools, needed, for example, for correct handling of some 
cross-realm trust security configurations.

We need to document some examples in the reference guide.  One examle, to 
configure a replication peer with different server principals:

{noformat}
add_peer '1', CLUSTER_KEY => "server1.cie.com:2181:/hbase",
CONFIG => {
'hbase.master.kerberos.principal' => 'hbase/instan...@realm2.com',
'hbase.regionserver.kerberos.principal' => 'hbase/instan...@realm2.com',
}
{noformat}

Additional arguments to VerifyReplication should also be documented in the 
usage output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15641) Shell "alter" should do a single modifyTable operation

2016-04-12 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-15641:
-

 Summary: Shell "alter" should do a single modifyTable operation
 Key: HBASE-15641
 URL: https://issues.apache.org/jira/browse/HBASE-15641
 Project: HBase
  Issue Type: Improvement
  Components: shell
Reporter: Gary Helmling


When performing an "alter" on multiple column families in a table, then shell 
will perform a separate {{Admin.modifyColumn()}} call for each column family 
being modified, with all of the table regions being bulk-reopened each time.  
It would be much better to simply apply all the changes to the table 
descriptor, then do a single call to {{Admin.modifyTable()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-15573) Indefinite pause while trying to cleanup data

2016-03-31 Thread Gary Helmling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-15573.
---
Resolution: Invalid

This JIRA instance is used for tracking development issues and bugs.

Please send an email to the u...@hbase.apache.org mailing list to ask any 
questions.  Sounds like a configuration issue with your client application.

> Indefinite pause while trying to cleanup data
> -
>
> Key: HBASE-15573
> URL: https://issues.apache.org/jira/browse/HBASE-15573
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2, 1.1.4
>Reporter: Jorge Figueira
>Priority: Blocker
> Attachments: hbase-hadoop-master-HBASE.log, 
> hbase-hadoop-regionserver-HBASE.log, hbase-hadoop-zookeeper-HBASE.log
>
>
> Can't retrieve any information with hbase  rpc java client.
> With hbase shell its possible to scan data and retrieve all the information 
> normally.
> But with any rpc client region server don't retrieve data, all data come with 
> null values.
> Region Server log:
> DEBUG [RpcServer.reader=2,bindAddress=HBASE,port=16020] ipc.RpcServer: 
> RpcServer.listener,port=16020: DISCONNECTING client SERVER:37088 because read 
> count=-1
> DEBUG [RpcServer.reader=2,bindAddress=HBASE,port=16020] ipc.RpcServer: 
> RpcServer.listener,port=16020: DISCONNECTING client SERVER2:36997 because 
> read count=-1
> Master log:
> 2016-03-31 18:16:27,998 DEBUG [ProcedureExecutorTimeout] 
> procedure2.ProcedureExecutor$CompletedProcedureCleaner: No completed 
> procedures to cleanup.
> 2016-03-31 18:16:57,998 DEBUG [ProcedureExecutorTimeout] 
> procedure2.ProcedureExecutor$CompletedProcedureCleaner: No completed 
> procedures to cleanup.
> 2016-03-31 18:17:27,998 DEBUG [ProcedureExecutorTimeout] 
> procedure2.ProcedureExecutor$CompletedProcedureCleaner: No completed 
> procedures to cleanup



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15741) TokenProvider coprocessor RPC incompatibile between 1.2 and 1.3

2016-04-29 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-15741:
-

 Summary: TokenProvider coprocessor RPC incompatibile between 1.2 
and 1.3
 Key: HBASE-15741
 URL: https://issues.apache.org/jira/browse/HBASE-15741
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors
Affects Versions: 1.3.0
Reporter: Gary Helmling
Priority: Blocker


Attempting to run a map reduce job with a 1.3 client on a secure cluster 
running 1.2 is failing when making the coprocessor rpc to obtain a delegation 
token:
{noformat}
Exception in thread "main" 
org.apache.hadoop.hbase.exceptions.UnknownProtocolException: 
org.apache.hadoop.hbase.exceptions.UnknownProtocolException: No registered 
coprocessor service found for name hbase.pb.AuthenticationService in region 
hbase:meta,,1
at 
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7741)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1988)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1970)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33652)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:137)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:112)
at java.lang.Thread.run(Thread.java:745)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:332)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1631)
at 
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:104)
at 
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:94)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:137)
at 
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:108)
at 
org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73)
at 
org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512)
at 
org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:86)
at 
org.apache.hadoop.hbase.security.token.TokenUtil$1.run(TokenUtil.java:111)
at 
org.apache.hadoop.hbase.security.token.TokenUtil$1.run(TokenUtil.java:108)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:340)
at 
org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:108)
at 
org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(TokenUtil.java:329)
at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initCredentials(TableMapReduceUtil.java:490)
at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:209)
at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:162)
at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:285)
at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:86)
at 
org.apache.hadoop.hbase.mapreduce.CellCounter.createSubmittableJob(CellCounter.java:193)
at 
org.apache.hadoop.hbase.mapreduce.CellCounter.main(CellCounter.java:290)
Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.UnknownProtocolException):
 org.apache.hadoop.hbase.exceptions.UnknownProtocolException: No registered 
coprocessor service found for name hbase.pb.AuthenticationService in region 
hba

[jira] [Created] (HBASE-15856) Cached Connection instances can wind up with addresses never resolved

2016-05-18 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-15856:
-

 Summary: Cached Connection instances can wind up with addresses 
never resolved
 Key: HBASE-15856
 URL: https://issues.apache.org/jira/browse/HBASE-15856
 Project: HBase
  Issue Type: Bug
  Components: Client
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Critical


During periods where DNS is not working properly, we can wind up caching 
connections to master or regionservers where the initial hostname resolution 
and the resolution is never re-attempted.  This means that clients will forever 
get UnknownHostException for any calls.

When constructing a BlockingRpcChannelImplementation, we instantiate the 
InetSocketAddress to use for the connection.  This instance is then used in the 
rpc client connection, where we check isUnresolved() and throw an 
UnknownHostException if that returns true.  However, at this point the rpc 
channel is already cached in the HConnectionImplementation map of stubs.  So at 
this point it will never be resolved.

Setting the config for hbase.resolve.hostnames.on.failure masks this issue, 
since the stub key used is modified to contain the address.  However, even in 
that case, if DNS fails, an rpc channel instance with unresolved ISA will still 
be cached in the stubs under the hostname only key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15773) CellCounter improvements

2016-05-05 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-15773:
-

 Summary: CellCounter improvements
 Key: HBASE-15773
 URL: https://issues.apache.org/jira/browse/HBASE-15773
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Gary Helmling


Looking at the CellCounter map reduce, it seems like it can be improved in a 
few areas:

* it does not currently support setting scan batching.  This is important when 
we're fetching all versions for columns.  Actually, it would be nice to support 
all of the scan configuration currently provided in TableInputFormat.
* generating job counters containing row keys and column qualifiers is 
guaranteed to blow up on anything but the smallest table.  This is not usable 
and doesn't make any sense when the same counts are in the job output.  The row 
and qualifier specific counters should be dropped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-13707) CellCounter uses to many counters

2016-05-06 Thread Gary Helmling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-13707.
---
Resolution: Duplicate
  Assignee: Gary Helmling  (was: NIDHI GAMBHIR)

Fixed in HBASE-15773

> CellCounter uses to many counters
> -
>
> Key: HBASE-13707
> URL: https://issues.apache.org/jira/browse/HBASE-13707
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.0.1
>Reporter: Jean-Marc Spaggiari
>    Assignee: Gary Helmling
>Priority: Minor
>  Labels: beginner
>
> CellCounters creates a counter per row... So it quickly becomes to many.
> We should provide an option to drop the statistic per rows and count only 
> cells overall for the table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15678) Normalize RetryingCallable cache clearing and implementations

2016-04-19 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-15678:
-

 Summary: Normalize RetryingCallable cache clearing and 
implementations
 Key: HBASE-15678
 URL: https://issues.apache.org/jira/browse/HBASE-15678
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Gary Helmling
Assignee: Gary Helmling


This is a fair amount of duplication and inconsistency in the meta cache 
handling of RetryingCallable implementations:
* meta cache is often cleared in prepare() when reload=true, in addition to 
being cleared in throwable()
* each RetryingCallable implementation does this slightly differently, leading 
to inconsistencies and potential bugs
* RegionServerCallable and RegionAdminServiceCallable duplicate a lot of code, 
but with small, seemingly unnecessary inconsistencies.  We should clean these 
up into a common base with subclasses doing only the necessary differentiation.

The main goal here is to establish some common handling, to the extent 
possible, for the meta cache interactions by the different implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16218) Eliminate use of UGI.doAs() in AccessController testing

2016-07-12 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16218:
-

 Summary: Eliminate use of UGI.doAs() in AccessController testing
 Key: HBASE-16218
 URL: https://issues.apache.org/jira/browse/HBASE-16218
 Project: HBase
  Issue Type: Sub-task
  Components: security
Reporter: Gary Helmling
Assignee: Gary Helmling


Many tests for AccessController observer coprocessor hooks make use of 
UGI.doAs() when the test user could simply be passed through.  Eliminate the 
unnecessary use of doAs().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16217) Identify calling user in ObserverContext

2016-07-12 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16217:
-

 Summary: Identify calling user in ObserverContext
 Key: HBASE-16217
 URL: https://issues.apache.org/jira/browse/HBASE-16217
 Project: HBase
  Issue Type: Sub-task
  Components: Coprocessors, security
Reporter: Gary Helmling
Assignee: Gary Helmling


We already either explicitly pass down the relevant User instance initiating an 
action through the call path, or it is available through 
RpcServer.getRequestUser().  We should carry this through in the 
ObserverContext for coprocessor upcalls and make use of it for permissions 
checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16277) Improve CPU efficiency in VisibilityLabelsCache

2016-07-22 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16277:
-

 Summary: Improve CPU efficiency in VisibilityLabelsCache
 Key: HBASE-16277
 URL: https://issues.apache.org/jira/browse/HBASE-16277
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Gary Helmling


For secure clusters where the VisibilityController coprocessor is loaded, 
regionservers sometimes degrade into very high CPU utilization, with many of 
the RPC handler threads stuck in:

{noformat}
"B.defaultRpcServer.handler=0,queue=0,port=16020" #114 daemon prio=5 os_prio=0 
tid=0x7f8a95bb7800 nid=0x382 runnable [0x7f8a3051f000]
   java.lang.Thread.State: RUNNABLE
at 
java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617)
at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499)
at java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298)
at java.lang.ThreadLocal.remove(ThreadLocal.java:222)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:426)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
at 
org.apache.hadoop.hbase.security.visibility.VisibilityLabelsCache.getGroupAuths(VisibilityLabelsCache.java:237)
at 
org.apache.hadoop.hbase.security.visibility.FeedUserAuthScanLabelGenerator.getLabels(FeedUserAuthScanLabelGenerator.java:70)
at 
org.apache.hadoop.hbase.security.visibility.DefaultVisibilityLabelServiceImpl.getVisibilityExpEvaluator(DefaultVisibilityLabelServiceImpl.java:469)
at 
org.apache.hadoop.hbase.security.visibility.VisibilityUtils.createVisibilityLabelFilter(VisibilityUtils.java:284)
at 
org.apache.hadoop.hbase.security.visibility.VisibilityController.preGetOp(VisibilityController.java:684)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$26.call(RegionCoprocessorHost.java:849)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1705)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:845)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6748)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6736)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2029)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:137)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:112)
at java.lang.Thread.run(Thread.java:745)
{noformat}

In this case there are no visibility labels actually in use, so it appears that 
the locking overhead for the VisibilityLabelsCache can reach a tipping point 
where it does not degrade gracefully.

We should look at alternate approaches to the label caching in place of the 
current ReentrantReadWriteLock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16231) Integration tests should support client keytab login for secure clusters

2016-07-14 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16231:
-

 Summary: Integration tests should support client keytab login for 
secure clusters
 Key: HBASE-16231
 URL: https://issues.apache.org/jira/browse/HBASE-16231
 Project: HBase
  Issue Type: Improvement
  Components: integration tests
Reporter: Gary Helmling
Assignee: Gary Helmling


Integration tests currently rely on an external kerberos login for secure 
clusters.  Elsewhere we use AuthUtil to login and refresh the credentials in a 
background thread.  We should do the same here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16141) Unwind use of UserGroupInformation.doAs() to convey requester identity in coprocessor upcalls

2016-06-28 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16141:
-

 Summary: Unwind use of UserGroupInformation.doAs() to convey 
requester identity in coprocessor upcalls
 Key: HBASE-16141
 URL: https://issues.apache.org/jira/browse/HBASE-16141
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, security
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 2.0.0, 1.4.0


In discussion on HBASE-16115, there is some discussion of whether 
UserGroupInformation.doAs() is the right mechanism for propagating the original 
requester's identify in certain system contexts (splits, compactions, some 
procedure calls).  It has the unfortunately of overriding the current user, 
which makes for very confusing semantics for coprocessor implementors.  We 
should instead find an alternate mechanism for conveying the caller identity, 
which does not override the current user context.

I think we should instead look at passing this through as part of the 
ObserverContext passed to every coprocessor hook.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16202) Backport metric for CallQueueTooBigException to 1.3

2016-07-08 Thread Gary Helmling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-16202.
---
Resolution: Invalid

No backport needed, I just did a simple cherry-pick to branch-1.3.

> Backport metric for CallQueueTooBigException to 1.3
> ---
>
> Key: HBASE-16202
> URL: https://issues.apache.org/jira/browse/HBASE-16202
> Project: HBase
>  Issue Type: Improvement
>  Components: IPC/RPC, metrics
>Reporter: Gary Helmling
>    Assignee: Gary Helmling
>
> HBASE-15353 added a separate metric for tracking the number of 
> CallQueueTooBigExceptions, but only went in to 1.4+.  Since CQTBE is already 
> in 1.2+, it would be nice to at least get this in the upcoming 1.3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16202) Backport metric for CallQueueTooBigException to 1.3

2016-07-08 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16202:
-

 Summary: Backport metric for CallQueueTooBigException to 1.3
 Key: HBASE-16202
 URL: https://issues.apache.org/jira/browse/HBASE-16202
 Project: HBase
  Issue Type: Improvement
  Components: IPC/RPC, metrics
Reporter: Gary Helmling
Assignee: Gary Helmling


HBASE-15353 added a separate metric for tracking the number of 
CallQueueTooBigExceptions, but only went in to 1.4+.  Since CQTBE is already in 
1.2+, it would be nice to at least get this in the upcoming 1.3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17604) Backport HBASE-15437 (fix request and response size metrics) to branch-1

2017-02-06 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-17604:
-

 Summary: Backport HBASE-15437 (fix request and response size 
metrics) to branch-1
 Key: HBASE-17604
 URL: https://issues.apache.org/jira/browse/HBASE-17604
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC, metrics
Reporter: Gary Helmling


HBASE-15437 fixed request and response size metrics in master.  We should apply 
the same to branch-1 and related release branches.

Prior to HBASE-15437, request and response size metrics were only calculated 
based on the protobuf message serialized size.  This isn't correct when the 
cell scanner payload is in use.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17611) Thrift 2 per-call latency metrics are capped at ~ 2 seconds

2017-02-07 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-17611:
-

 Summary: Thrift 2 per-call latency metrics are capped at ~ 2 
seconds
 Key: HBASE-17611
 URL: https://issues.apache.org/jira/browse/HBASE-17611
 Project: HBase
  Issue Type: Bug
  Components: metrics, Thrift
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 1.3.1


Thrift 2 latency metrics are measured in nanoseconds.  However, the duration 
used for per-method latencies is cast to an int, meaning the values are capped 
at 2.147 seconds.  Let's use a long instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17578) Thrift per-method metrics should still update in the case of exceptions

2017-02-01 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-17578:
-

 Summary: Thrift per-method metrics should still update in the case 
of exceptions
 Key: HBASE-17578
 URL: https://issues.apache.org/jira/browse/HBASE-17578
 Project: HBase
  Issue Type: Bug
  Components: Thrift
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 1.3.1


Currently, the InvocationHandler used to update per-method metrics in the 
Thrift server fails to update metrics if an exception occurs.  This causes us 
to miss outliers.  We should include exceptional cases in per-method latencies, 
and also look at adding specific exception rate metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-16540) Scan should do additional validation on start and stop row

2016-09-01 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16540:
-

 Summary: Scan should do additional validation on start and stop row
 Key: HBASE-16540
 URL: https://issues.apache.org/jira/browse/HBASE-16540
 Project: HBase
  Issue Type: Bug
  Components: Client
Reporter: Gary Helmling


Scan.setStartRow() and setStopRow() should validate the byte[] passed to ensure 
it meets the criteria for a row key.  If the byte[] length is greater that 
Short.MAX_VALUE, we should throw an IllegalArgumentException in order to fast 
fail and prevent server-side errors being thrown and retried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16518) Remove old .arcconfig file

2016-08-29 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16518:
-

 Summary: Remove old .arcconfig file
 Key: HBASE-16518
 URL: https://issues.apache.org/jira/browse/HBASE-16518
 Project: HBase
  Issue Type: Task
  Components: tooling
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Trivial


The project .arcconfig file points to a project that no longer exists on a no 
longer supported phabricator instance.  Since it is no longer used for reviews, 
let's drop it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

2016-10-06 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16788:
-

 Summary: Race in compacted file deletion between HStore close() 
and closeAndArchiveCompactedFiles()
 Key: HBASE-16788
 URL: https://issues.apache.org/jira/browse/HBASE-16788
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.3.0
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Blocker


HBASE-13082 changed the way that compacted files are archived from being done 
inline on compaction completion to an async cleanup by the 
CompactedHFilesDischarger chore.  It looks like the changes to HStore to 
support this introduced a race condition in the compacted HFile archiving.

In the following sequence, we can wind up with two separate threads trying to 
archive the same HFiles, causing a regionserver abort:

# compaction completes normally and the compacted files are added to 
{{compactedfiles}} in HStore's DefaultStoreFileManager
# *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, 
calling closeAndArchiveCompactedFiles()
## obtains HStore readlock
## gets a copy of compactedfiles
## releases readlock
# *threadB*: calls HStore.close() as part of region close
## obtains HStore writelock
## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of same 
compactedfiles
# *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
## call HStore.clearCompactedFiles()
## waits on write lock
# *threadB*: continues with close()
## calls removeCompactedfiles(compactedfiles)
## calls HRegionFIleSystem.removeStoreFiles() -> 
HFileArchiver.archiveStoreFiles()
## receives FileNotFoundException because the files have already been archived 
by threadA
## throws IOException
# RS aborts

I think the combination of fetching the compactedfiles list and removing the 
files needs to be covered by locking.  Options I see are:
* Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of 
readlock and move the call to removeCompactedfiles() inside the lock.  This 
means the read operations will be blocked while the files are being archived, 
which is bad.
* Synchronize closeAndArchiveCompactedFiles() and modify close() to call it 
instead of calling removeCompactedfiles() directly
* Add a separate lock for compacted files removal and use in 
closeAndArchiveCompactedFiles() and close()





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16657) Expose per-region last major compaction timestamp in RegionServer UI

2016-09-19 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16657:
-

 Summary: Expose per-region last major compaction timestamp in 
RegionServer UI
 Key: HBASE-16657
 URL: https://issues.apache.org/jira/browse/HBASE-16657
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, UI
Reporter: Gary Helmling


HBASE-12859 added some tracking for the last major compaction completed for 
each region.  However, this is currently only exposed through the cluster 
status reporting and the Admin API.  Since the regionserver is already 
reporting this information, it would be nice to fold it in somewhere to the 
region listing in the regionserver UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16661) Add last major compaction age to per-region metrics

2016-09-20 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16661:
-

 Summary: Add last major compaction age to per-region metrics
 Key: HBASE-16661
 URL: https://issues.apache.org/jira/browse/HBASE-16661
 Project: HBase
  Issue Type: Improvement
Reporter: Gary Helmling
Priority: Minor


After HBASE-12859, we can now track the last major compaction timestamp for 
each region.  However, this is only exposed through cluster status reporting 
and the admin API.

We have similar per-region metrics around storefile age, but none that filters 
on major compaction specifically.

Let's add a metric for last major compaction age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16754) Regions failing compaction due to referencing non-existent store file

2016-10-03 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16754:
-

 Summary: Regions failing compaction due to referencing 
non-existent store file
 Key: HBASE-16754
 URL: https://issues.apache.org/jira/browse/HBASE-16754
 Project: HBase
  Issue Type: Bug
Reporter: Gary Helmling
Priority: Blocker
 Fix For: 1.3.0


Running a mixed read write workload on a recent build off branch-1.3, we are 
seeing compactions occasionally fail with errors like the following (actual 
filenames replaced with placeholders):

{noformat}
16/09/27 16:57:28 ERROR regionserver.CompactSplitThread: Compaction selection 
failed Store = XXX, pri = 116
java.io.FileNotFoundException: File does not exist: 
hdfs://.../hbase/data/ns/table/region/cf/XXfilenameXX
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)  
at 
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
at 
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
  
at 
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:321)
  
at 
org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
at 
org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:63)
at 
org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
  
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
  
at 
org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1644)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread.selectCompaction(CompactSplitThread.java:373)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread.access$100(CompactSplitThread.java:59)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:498)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:568)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/09/27 17:01:31 ERROR regionserver.CompactSplitThread: Compaction selection 
failed Store = XXX, pri = 115
java.io.FileNotFoundException: File does not exist: 
hdfs://.../hbase/data/ns/table/region/cf/XXfilenameXX
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)  
at 
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
at 
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
  
at 
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:321)
  
at 
org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
at 
org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:63)
at 
org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
  
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
  
at 
org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1644)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread.selectCompaction(CompactSplitThread.java:373

[jira] [Created] (HBASE-16958) Balancer recomputes block distributions every time balanceCluster() runs

2016-10-27 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16958:
-

 Summary: Balancer recomputes block distributions every time 
balanceCluster() runs
 Key: HBASE-16958
 URL: https://issues.apache.org/jira/browse/HBASE-16958
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 1.3.0


The change in HBASE-16570 modified the balancer to compute block distributions 
in parallel with a pool of 5 threads.  However, because it does this every time 
Cluster is instantiated, it effectively bypasses the cache of block locations 
added in HBASE-14473:

In the LoadBalancer.balanceCluster() implementations (in 
StochasticLoadBalancer, SimpleLoadBalancer), we create a new Cluster instance.

In Cluster., we call registerRegion() on every HRegionInfo.

In registerRegion(), we do the following:
{code}
regionLocationFutures.set(regionIndex,
regionFinder.asyncGetBlockDistribution(region));
{code}

Then, back in Cluster. we do a get() on each ListenableFuture in a loop.

So while we are doing the calls to get block locations in parallel with 5 
threads, we're recomputing them every time balanceCluster() is called and not 
taking advantage of the cache at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16958) Balancer recomputes block distributions every time balanceCluster() runs

2016-10-27 Thread Gary Helmling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-16958.
---
   Resolution: Duplicate
 Assignee: (was: Gary Helmling)
Fix Version/s: (was: 1.3.0)

I re-opened HBASE-16570 to fix the issue that is described here.

> Balancer recomputes block distributions every time balanceCluster() runs
> 
>
> Key: HBASE-16958
> URL: https://issues.apache.org/jira/browse/HBASE-16958
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Reporter: Gary Helmling
>
> The change in HBASE-16570 modified the balancer to compute block 
> distributions in parallel with a pool of 5 threads.  However, because it does 
> this every time Cluster is instantiated, it effectively bypasses the cache of 
> block locations added in HBASE-14473:
> In the LoadBalancer.balanceCluster() implementations (in 
> StochasticLoadBalancer, SimpleLoadBalancer), we create a new Cluster instance.
> In Cluster., we call registerRegion() on every HRegionInfo.
> In registerRegion(), we do the following:
> {code}
> regionLocationFutures.set(regionIndex,
> regionFinder.asyncGetBlockDistribution(region));
> {code}
> Then, back in Cluster. we do a get() on each ListenableFuture in a loop.
> So while we are doing the calls to get block locations in parallel with 5 
> threads, we're recomputing them every time balanceCluster() is called and not 
> taking advantage of the cache at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-16570) Compute region locality in parallel at startup

2016-10-27 Thread Gary Helmling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling reopened HBASE-16570:
---

I've reverted this from branch-1.3 for the moment, until the issue that I 
described can be addressed.  I don't see where this would impact master startup 
time.  If we need to pre-initialize this on startup, let's do it in a 
background thread only on startup.  We need to make sure that locality is not 
recomputed on every run and that we use the cache instead.

> Compute region locality in parallel at startup
> --
>
> Key: HBASE-16570
> URL: https://issues.apache.org/jira/browse/HBASE-16570
> Project: HBase
>  Issue Type: Sub-task
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: HBASE-16570-master_V1.patch, 
> HBASE-16570-master_V2.patch, HBASE-16570-master_V3.patch, 
> HBASE-16570-master_V4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16964) Successfully archived files are not cleared from compacted store file list if archiving of any file fails

2016-10-28 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-16964:
-

 Summary: Successfully archived files are not cleared from 
compacted store file list if archiving of any file fails
 Key: HBASE-16964
 URL: https://issues.apache.org/jira/browse/HBASE-16964
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Blocker
 Fix For: 1.3.0


In HStore.removeCompactedFiles(), we only clear archived files from 
StoreFileManager's list of compactedfiles if _all_ files were archived 
successfully.  If we encounter an error archiving any of the files, then any 
files which were already archived will remain in the list of compactedfiles.

Even worse, this means that all subsequent attempts to archive the list of 
compacted files will fail (as the previously successfully archived files still 
in the list will now throw FileNotFoundException), and the list of 
compactedfiles will never be cleared from that point on.

Finally, when the region closes, we will again throw an exception out of 
HStore.removeCompactedFiles(), in this case causing a regionserver abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16146) Counters are expensive...

2016-10-11 Thread Gary Helmling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-16146.
---
   Resolution: Fixed
 Assignee: Gary Helmling
 Hadoop Flags: Reviewed
Fix Version/s: 1.4.0
   1.3.0
   2.0.0

Committed to branch-1.3, branch-1, and master.  Counter is no longer used in 
master, but still present as a deprecated class, so included for consistency.

Thanks, [~stack], [~mantonov], and [~enis] for reviews.

> Counters are expensive...
> -
>
> Key: HBASE-16146
> URL: https://issues.apache.org/jira/browse/HBASE-16146
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: Gary Helmling
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-16146.001.patch, HBASE-16146.branch-1.001.patch, 
> HBASE-16146.branch-1.3.001.patch, counters.patch, less_and_less_counters.png
>
>
> Doing workloadc, perf shows 10%+ of CPU being spent on counter#add. If I 
> disable some of the hot ones -- see patch -- I can get 10% more throughput 
> (390k to 440k). Figure something better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16337) Removing peers seem to be leaving spare queues

2016-12-05 Thread Gary Helmling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-16337.
---
Resolution: Duplicate

Closing as a dupe, thanks for pointing it out.

> Removing peers seem to be leaving spare queues
> --
>
> Key: HBASE-16337
> URL: https://issues.apache.org/jira/browse/HBASE-16337
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Reporter: Joseph
>
> I have been running IntegrationTestReplication repeatedly with the backported 
> Replication Table changes. Every other iteration of the test fails with, but 
> these queues should have been deleted when we removed the peers. I believe 
> this may be related to HBASE-16096, HBASE-16208, or HBASE-16081.
> 16/08/02 08:36:07 ERROR util.AbstractHBaseTool: Error running command-line 
> tool
> org.apache.hadoop.hbase.replication.ReplicationException: undeleted queue for 
> peerId: TestPeer, replicator: 
> hbase4124.ash2.facebook.com,16020,1470150251042, queueId: TestPeer
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl.checkQueuesDeleted(ReplicationPeersZKImpl.java:544)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl.addPeer(ReplicationPeersZKImpl.java:127)
>   at 
> org.apache.hadoop.hbase.client.replication.ReplicationAdmin.addPeer(ReplicationAdmin.java:200)
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestReplication$VerifyReplicationLoop.setupTablesAndReplication(IntegrationTestReplication.java:239)
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestReplication$VerifyReplicationLoop.run(IntegrationTestReplication.java:325)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestReplication.runTestFromCommandLine(IntegrationTestReplication.java:418)
>   at 
> org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:134)
>   at 
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestReplication.main(IntegrationTestReplication.java:424)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17381) ReplicationSourceWorkerThread can die due to unhandled exceptions

2016-12-27 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-17381:
-

 Summary: ReplicationSourceWorkerThread can die due to unhandled 
exceptions
 Key: HBASE-17381
 URL: https://issues.apache.org/jira/browse/HBASE-17381
 Project: HBase
  Issue Type: Bug
Reporter: Gary Helmling


If a ReplicationSourceWorkerThread encounters an unexpected exception in the 
run() method (for example failure to allocate direct memory for the DFS 
client), the exception will be logged by the UncaughtExceptionHandler, but the 
thread will also die and the replication queue will back up indefinitely until 
the Regionserver is restarted.

We should make sure the worker thread is resilient to all exceptions that it 
can actually handle.  For those that it really can't, it seems better to abort 
the regionserver rather than just allow replication to stop with minimal signal.

Here is a sample exception:

{noformat}
ERROR regionserver.ReplicationSource: Unexpected exception in 
ReplicationSourceWorkerThread, 
currentPath=hdfs://.../hbase/WALs/XXXwalfilenameXXX
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:693)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at 
org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:96)
at 
org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:113)
at 
org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:108)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.createStreamPair(DataTransferSaslUtil.java:344)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:490)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:391)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
at 
org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:92)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3444)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:695)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:356)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:308)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17827) Client tools relying on AuthUtil.getAuthChore() break credential cache login

2017-03-23 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-17827:
-

 Summary: Client tools relying on AuthUtil.getAuthChore() break 
credential cache login
 Key: HBASE-17827
 URL: https://issues.apache.org/jira/browse/HBASE-17827
 Project: HBase
  Issue Type: Bug
  Components: canary, security
Reporter: Gary Helmling
Assignee: Gary Helmling


Client tools, such as Canary, which make use of keytab based logins with 
AuthUtil.getAuthChore() do not allow any way to continue without a keytab-based 
login when security is enabled.  Currently, when security is enabled and the 
configuration lacks {{hbase.client.keytab.file}}, these tools would fail with:

{noformat}
ERROR hbase.AuthUtil: Error while trying to perform the initial login: Running 
in secure mode, but config doesn't have a keytab
java.io.IOException: Running in secure mode, but config doesn't have a keytab
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:239)
at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.login(User.java:420)
at org.apache.hadoop.hbase.security.User.login(User.java:258)
at 
org.apache.hadoop.hbase.security.UserProvider.login(UserProvider.java:197)
at org.apache.hadoop.hbase.AuthUtil.getAuthChore(AuthUtil.java:98)
at org.apache.hadoop.hbase.tool.Canary.run(Canary.java:589)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.tool.Canary.main(Canary.java:1327)
Exception in thread "main" java.io.IOException: Running in secure mode, but 
config doesn't have a keytab
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:239)
at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.login(User.java:420)
at org.apache.hadoop.hbase.security.User.login(User.java:258)
at 
org.apache.hadoop.hbase.security.UserProvider.login(UserProvider.java:197)
at org.apache.hadoop.hbase.AuthUtil.getAuthChore(AuthUtil.java:98)
at org.apache.hadoop.hbase.tool.Canary.run(Canary.java:589)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.tool.Canary.main(Canary.java:1327)
{noformat}

These tools should still work with the default credential-cache login, at least 
when a client keytab is not configured.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (HBASE-12579) Move obtainAuthTokenForJob() methods out of User

2017-03-20 Thread Gary Helmling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-12579.
---
Resolution: Duplicate

The methods were deprecated and the existing usage was removed as part of 
HBASE-12493.  I guess I left this open for the final removal of the deprecated 
methods from the next major release.  The removal was done as part of 
HBASE-14208.

> Move obtainAuthTokenForJob() methods out of User
> 
>
> Key: HBASE-12579
> URL: https://issues.apache.org/jira/browse/HBASE-12579
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Gary Helmling
>
> The {{User}} class currently contains some utility methods to obtain HBase 
> authentication tokens for the given user.  However, these methods initiate an 
> RPC to the {{TokenProvider}} coprocessor endpoint, an action which should not 
> be part of the User class' responsibilities.
> This leads to a couple of problems:
> # The way the methods are currently structured, it is impossible to integrate 
> them with normal connection management for the cluster (the TokenUtil class 
> constructs its own HTable instance internally).
> # The User class is logically part of the hbase-common module, but uses the 
> TokenUtil class (part of hbase-server, though it should probably be moved to 
> hbase-client) through reflection, leading to a hidden dependency.
> The {{obtainAuthTokenForJob()}} methods should be deprecated and the process 
> of obtaining authentication tokens should be moved to use the normal 
> connection lifecycle.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17884) Backport HBASE-16217 to branch-1

2017-04-05 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-17884:
-

 Summary: Backport HBASE-16217 to branch-1
 Key: HBASE-17884
 URL: https://issues.apache.org/jira/browse/HBASE-17884
 Project: HBase
  Issue Type: Sub-task
Reporter: Gary Helmling


The change to add calling user to ObserverContext in HBASE-16217 should also be 
applied to branch-1 to avoid use of UserGroupInformation.doAs() for access 
control checks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-18072) Malformed Cell from client causes Regionserver abort on flush

2017-05-18 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-18072:
-

Summary: Malformed Cell from client causes Regionserver abort on
flush
Key: HBASE-18072
URL: https://issues.apache.org/jira/browse/HBASE-18072
Project: HBase
Issue Type: Bug
Components: regionserver, rpc
Affects Versions: 1.3.0
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Critical

When a client writes a mutation with a Cell with a corrupted value length
field, it is possible for the corrupt cell to trigger an exception on memstore
flush, which will trigger regionserver aborts until the region is manually
recovered.

This boils down to a lack of validation on the client submitted byte[] backing
the cell.

Consider the following sequence:

1. Client creates a new Put with a cell with value of byte[16]
2. When the backing KeyValue for the Put is created, we serialize 16 for the
value length field in the backing array
3. Client calls Table.put()
4. RpcClientImpl calls KeyValueEncoder.encode() to serialize the Cell to the
OutputStream
5. Memory corruption in the backing array changes the serialized contents of
the value length field from 16 to 48
6. Regionserver handling the put uses KeyValueDecoder.decode() to create a
KeyValue with the byte[] read directly off the InputStream. The overall length
of the array is correct, but the integer value serialized at the value length
offset has been corrupted from the original value of 16 to 48.
7. The corrupt KeyValue is appended to the WAL and added to the memstore
8. After some time, the memstore flushes. As HFileWriter is writing out the
corrupted cell, it reads the serialized int from the value length position in
the cell's byte[] to determine the number of bytes to write for the value.
Because value offset + 48 is greater than the length of the cell's byte[], we
hit an IndexOutOfBoundsException:
{noformat}
java.lang.IndexOutOfBoundsException
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:151)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at
org.apache.hadoop.hbase.io.hfile.NoOpDataBlockEncoder.encode(NoOpDataBlockEncoder.java:56)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(HFileBlock.java:954)
at
org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:284)
at
org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
at
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1041)
at
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:138)
at
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
at
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:937)
at
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2413)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2456)
{noformat}
9. Regionserver aborts due to the failed flush
10. The regionserver WAL is split into recovered.edits files, one of these
containing the same corrupted cell
11. A new regionserver is assigned the region with the corrupted write
12. The new regionserver replays the recovered.edits entries into memstore and
then tries to flush the memstore to an HFile
13. The flush triggers the same IndexOutOfBoundsException, causing us to go
back to step #8 and loop on repeat until manual intervention is taken

The corrupted cell basically becomes a poison pill that aborts regionservers
one at a time as the region with the problem edit is passed around. This also
means that a malicious client could easily construct requests allowing a denial
of service attack against regionservers hosting any tables that the client has
write access to.

At bare minimum, I think we need to do a sanity check on all the lengths for
Cells read off the CellScanner for incoming requests. This would allow us to
reject corrupt cells before we append them to the WAL and succeed the request,
putting us in a position where we cannot recover. This would only detect the
corruption of length fields which puts us in a bad state.

Whether or not Cells should carry some checksum generated at the time the Cell
is created, which could then validated on the server-side, is a separate
question. This would allow detection of other parts of the backing cell
byte[], such as within the key fields or the value field. But the computer
overhead of this may be too heavyweight to be practical.

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-05-31 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-18141:
-

 Summary: Regionserver fails to shutdown when abort triggered in 
RegionScannerImpl during RPC call
 Key: HBASE-18141
 URL: https://issues.apache.org/jira/browse/HBASE-18141
 Project: HBase
  Issue Type: Bug
  Components: regionserver, security
Affects Versions: 1.3.1
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Critical
 Fix For: 1.3.2


When an abort is triggered within the RPC call path by 
HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC caller 
identity in the RegionServerObserver.preStopRegionServer() hook.  This leaves 
the regionserver in a non-responsive state, where its regions are not 
reassigned and it returns exceptions for all requests.

When an abort is triggered on the server side, we should not allow a 
coprocessor to reject the abort at all.

Here is a sample stack trace:
{noformat}
17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: loaded 
coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController, 
org.apache.hadoop.hbase.security.token.TokenProvider]
17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
stop
org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
permissions for user 'rpcuser' (global, action=ADMIN)
at 
org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
at 
org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
at 
org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
at 
org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
at 
org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
at 
org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
{noformat}

I haven't yet evaluated which other release branches this might apply to.

I have a patch currently in progress, which I will post as soon as I complete a 
test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-19332) DumpReplicationQueues misreports total WAL size

2017-11-22 Thread Gary Helmling (JIRA)

Gary Helmling created HBASE-19332:
-

 Summary: DumpReplicationQueues misreports total WAL size
 Key: HBASE-19332
 URL: https://issues.apache.org/jira/browse/HBASE-19332
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Trivial


DumpReplicationQueues uses an int to collect the total WAL size for a queue.  
Predictably, this overflows much of the time.  Let's use a long instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

< 1 2 3

201 - 260 of 260 matches

Mail list logo