[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893874#comment-15893874
 ] 

Hudson commented on HBASE-17717:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK7 #1852 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1852/])
HBASE-17717 Explicitly use "sasl" ACL scheme for hbase superuser (elserj: rev 
54747ccb285484dbbf823e30d08bace16bbc10bb)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
* (add) 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKUtil.java


> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.0.98.patch, 
> HBASE-17717.001.branch-1.1.patch, HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17623) Reuse the bytes array when building the hfile block

2017-03-02 Thread CHIA-PING TSAI (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHIA-PING TSAI updated HBASE-17623:
---
Attachment: HBASE-17623.branch-1.v2.patch

> Reuse the bytes array when building the hfile block
> ---
>
> Key: HBASE-17623
> URL: https://issues.apache.org/jira/browse/HBASE-17623
> Project: HBase
>  Issue Type: Improvement
>Reporter: CHIA-PING TSAI
>Assignee: CHIA-PING TSAI
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: after(snappy_hfilesize=5.04GB).png, 
> after(snappy_hfilesize=755MB).png, before(snappy_hfilesize=5.04GB).png, 
> before(snappy_hfilesize=755MB).png, HBASE-17623.branch-1.v0.patch, 
> HBASE-17623.branch-1.v1.patch, HBASE-17623.branch-1.v2.patch, 
> HBASE-17623.v0.patch, HBASE-17623.v1.patch, HBASE-17623.v1.patch, 
> HBASE-17623.v2.patch, memory allocation measurement.xlsx
>
>
> There are two improvements.
> # The onDiskBlockBytesWithHeader should maintain a bytes array which can be 
> reused when building the hfile.
> # The onDiskBlockBytesWithHeader is copied to an new bytes array only when we 
> need to cache the block.
> # If no block need to be cached, the uncompressedBlockBytesWithHeader will 
> never be created.
> {code:title=HFileBlock.java|borderStyle=solid}
> private void finishBlock() throws IOException {
>   if (blockType == BlockType.DATA) {
> this.dataBlockEncoder.endBlockEncoding(dataBlockEncodingCtx, 
> userDataStream,
> baosInMemory.getBuffer(), blockType);
> blockType = dataBlockEncodingCtx.getBlockType();
>   }
>   userDataStream.flush();
>   // This does an array copy, so it is safe to cache this byte array when 
> cache-on-write.
>   // Header is still the empty, 'dummy' header that is yet to be filled 
> out.
>   uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   prevOffset = prevOffsetByType[blockType.getId()];
>   // We need to set state before we can package the block up for 
> cache-on-write. In a way, the
>   // block is ready, but not yet encoded or compressed.
>   state = State.BLOCK_READY;
>   if (blockType == BlockType.DATA || blockType == BlockType.ENCODED_DATA) 
> {
> onDiskBlockBytesWithHeader = dataBlockEncodingCtx.
> compressAndEncrypt(uncompressedBlockBytesWithHeader);
>   } else {
> onDiskBlockBytesWithHeader = defaultBlockEncodingCtx.
> compressAndEncrypt(uncompressedBlockBytesWithHeader);
>   }
>   // Calculate how many bytes we need for checksum on the tail of the 
> block.
>   int numBytes = (int) ChecksumUtil.numBytes(
>   onDiskBlockBytesWithHeader.length,
>   fileContext.getBytesPerChecksum());
>   // Put the header for the on disk bytes; header currently is 
> unfilled-out
>   putHeader(onDiskBlockBytesWithHeader, 0,
>   onDiskBlockBytesWithHeader.length + numBytes,
>   uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>   // Set the header for the uncompressed bytes (for cache-on-write) -- 
> IFF different from
>   // onDiskBlockBytesWithHeader array.
>   if (onDiskBlockBytesWithHeader != uncompressedBlockBytesWithHeader) {
> putHeader(uncompressedBlockBytesWithHeader, 0,
>   onDiskBlockBytesWithHeader.length + numBytes,
>   uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>   }
>   if (onDiskChecksum.length != numBytes) {
> onDiskChecksum = new byte[numBytes];
>   }
>   ChecksumUtil.generateChecksums(
>   onDiskBlockBytesWithHeader, 0, onDiskBlockBytesWithHeader.length,
>   onDiskChecksum, 0, fileContext.getChecksumType(), 
> fileContext.getBytesPerChecksum());
> }{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17623) Reuse the bytes array when building the hfile block

2017-03-02 Thread CHIA-PING TSAI (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHIA-PING TSAI updated HBASE-17623:
---
Status: Patch Available  (was: Open)

> Reuse the bytes array when building the hfile block
> ---
>
> Key: HBASE-17623
> URL: https://issues.apache.org/jira/browse/HBASE-17623
> Project: HBase
>  Issue Type: Improvement
>Reporter: CHIA-PING TSAI
>Assignee: CHIA-PING TSAI
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: after(snappy_hfilesize=5.04GB).png, 
> after(snappy_hfilesize=755MB).png, before(snappy_hfilesize=5.04GB).png, 
> before(snappy_hfilesize=755MB).png, HBASE-17623.branch-1.v0.patch, 
> HBASE-17623.branch-1.v1.patch, HBASE-17623.branch-1.v2.patch, 
> HBASE-17623.v0.patch, HBASE-17623.v1.patch, HBASE-17623.v1.patch, 
> HBASE-17623.v2.patch, memory allocation measurement.xlsx
>
>
> There are two improvements.
> # The onDiskBlockBytesWithHeader should maintain a bytes array which can be 
> reused when building the hfile.
> # The onDiskBlockBytesWithHeader is copied to an new bytes array only when we 
> need to cache the block.
> # If no block need to be cached, the uncompressedBlockBytesWithHeader will 
> never be created.
> {code:title=HFileBlock.java|borderStyle=solid}
> private void finishBlock() throws IOException {
>   if (blockType == BlockType.DATA) {
> this.dataBlockEncoder.endBlockEncoding(dataBlockEncodingCtx, 
> userDataStream,
> baosInMemory.getBuffer(), blockType);
> blockType = dataBlockEncodingCtx.getBlockType();
>   }
>   userDataStream.flush();
>   // This does an array copy, so it is safe to cache this byte array when 
> cache-on-write.
>   // Header is still the empty, 'dummy' header that is yet to be filled 
> out.
>   uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   prevOffset = prevOffsetByType[blockType.getId()];
>   // We need to set state before we can package the block up for 
> cache-on-write. In a way, the
>   // block is ready, but not yet encoded or compressed.
>   state = State.BLOCK_READY;
>   if (blockType == BlockType.DATA || blockType == BlockType.ENCODED_DATA) 
> {
> onDiskBlockBytesWithHeader = dataBlockEncodingCtx.
> compressAndEncrypt(uncompressedBlockBytesWithHeader);
>   } else {
> onDiskBlockBytesWithHeader = defaultBlockEncodingCtx.
> compressAndEncrypt(uncompressedBlockBytesWithHeader);
>   }
>   // Calculate how many bytes we need for checksum on the tail of the 
> block.
>   int numBytes = (int) ChecksumUtil.numBytes(
>   onDiskBlockBytesWithHeader.length,
>   fileContext.getBytesPerChecksum());
>   // Put the header for the on disk bytes; header currently is 
> unfilled-out
>   putHeader(onDiskBlockBytesWithHeader, 0,
>   onDiskBlockBytesWithHeader.length + numBytes,
>   uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>   // Set the header for the uncompressed bytes (for cache-on-write) -- 
> IFF different from
>   // onDiskBlockBytesWithHeader array.
>   if (onDiskBlockBytesWithHeader != uncompressedBlockBytesWithHeader) {
> putHeader(uncompressedBlockBytesWithHeader, 0,
>   onDiskBlockBytesWithHeader.length + numBytes,
>   uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>   }
>   if (onDiskChecksum.length != numBytes) {
> onDiskChecksum = new byte[numBytes];
>   }
>   ChecksumUtil.generateChecksums(
>   onDiskBlockBytesWithHeader, 0, onDiskBlockBytesWithHeader.length,
>   onDiskChecksum, 0, fileContext.getChecksumType(), 
> fileContext.getBytesPerChecksum());
> }{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17623) Reuse the bytes array when building the hfile block

2017-03-02 Thread CHIA-PING TSAI (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHIA-PING TSAI updated HBASE-17623:
---
Status: Open  (was: Patch Available)

> Reuse the bytes array when building the hfile block
> ---
>
> Key: HBASE-17623
> URL: https://issues.apache.org/jira/browse/HBASE-17623
> Project: HBase
>  Issue Type: Improvement
>Reporter: CHIA-PING TSAI
>Assignee: CHIA-PING TSAI
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: after(snappy_hfilesize=5.04GB).png, 
> after(snappy_hfilesize=755MB).png, before(snappy_hfilesize=5.04GB).png, 
> before(snappy_hfilesize=755MB).png, HBASE-17623.branch-1.v0.patch, 
> HBASE-17623.branch-1.v1.patch, HBASE-17623.v0.patch, HBASE-17623.v1.patch, 
> HBASE-17623.v1.patch, HBASE-17623.v2.patch, memory allocation measurement.xlsx
>
>
> There are two improvements.
> # The onDiskBlockBytesWithHeader should maintain a bytes array which can be 
> reused when building the hfile.
> # The onDiskBlockBytesWithHeader is copied to an new bytes array only when we 
> need to cache the block.
> # If no block need to be cached, the uncompressedBlockBytesWithHeader will 
> never be created.
> {code:title=HFileBlock.java|borderStyle=solid}
> private void finishBlock() throws IOException {
>   if (blockType == BlockType.DATA) {
> this.dataBlockEncoder.endBlockEncoding(dataBlockEncodingCtx, 
> userDataStream,
> baosInMemory.getBuffer(), blockType);
> blockType = dataBlockEncodingCtx.getBlockType();
>   }
>   userDataStream.flush();
>   // This does an array copy, so it is safe to cache this byte array when 
> cache-on-write.
>   // Header is still the empty, 'dummy' header that is yet to be filled 
> out.
>   uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   prevOffset = prevOffsetByType[blockType.getId()];
>   // We need to set state before we can package the block up for 
> cache-on-write. In a way, the
>   // block is ready, but not yet encoded or compressed.
>   state = State.BLOCK_READY;
>   if (blockType == BlockType.DATA || blockType == BlockType.ENCODED_DATA) 
> {
> onDiskBlockBytesWithHeader = dataBlockEncodingCtx.
> compressAndEncrypt(uncompressedBlockBytesWithHeader);
>   } else {
> onDiskBlockBytesWithHeader = defaultBlockEncodingCtx.
> compressAndEncrypt(uncompressedBlockBytesWithHeader);
>   }
>   // Calculate how many bytes we need for checksum on the tail of the 
> block.
>   int numBytes = (int) ChecksumUtil.numBytes(
>   onDiskBlockBytesWithHeader.length,
>   fileContext.getBytesPerChecksum());
>   // Put the header for the on disk bytes; header currently is 
> unfilled-out
>   putHeader(onDiskBlockBytesWithHeader, 0,
>   onDiskBlockBytesWithHeader.length + numBytes,
>   uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>   // Set the header for the uncompressed bytes (for cache-on-write) -- 
> IFF different from
>   // onDiskBlockBytesWithHeader array.
>   if (onDiskBlockBytesWithHeader != uncompressedBlockBytesWithHeader) {
> putHeader(uncompressedBlockBytesWithHeader, 0,
>   onDiskBlockBytesWithHeader.length + numBytes,
>   uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>   }
>   if (onDiskChecksum.length != numBytes) {
> onDiskChecksum = new byte[numBytes];
>   }
>   ChecksumUtil.generateChecksums(
>   onDiskBlockBytesWithHeader, 0, onDiskBlockBytesWithHeader.length,
>   onDiskChecksum, 0, fileContext.getChecksumType(), 
> fileContext.getBytesPerChecksum());
> }{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893854#comment-15893854
 ] 

Hudson commented on HBASE-17717:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK8 #1936 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1936/])
HBASE-17717 Explicitly use "sasl" ACL scheme for hbase superuser (elserj: rev 
54747ccb285484dbbf823e30d08bace16bbc10bb)
* (add) 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKUtil.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java


> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.0.98.patch, 
> HBASE-17717.001.branch-1.1.patch, HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17623) Reuse the bytes array when building the hfile block

2017-03-02 Thread CHIA-PING TSAI (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893842#comment-15893842
 ] 

CHIA-PING TSAI commented on HBASE-17623:


The patch for branch-1 need to be updated. Coming soon.

> Reuse the bytes array when building the hfile block
> ---
>
> Key: HBASE-17623
> URL: https://issues.apache.org/jira/browse/HBASE-17623
> Project: HBase
>  Issue Type: Improvement
>Reporter: CHIA-PING TSAI
>Assignee: CHIA-PING TSAI
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: after(snappy_hfilesize=5.04GB).png, 
> after(snappy_hfilesize=755MB).png, before(snappy_hfilesize=5.04GB).png, 
> before(snappy_hfilesize=755MB).png, HBASE-17623.branch-1.v0.patch, 
> HBASE-17623.branch-1.v1.patch, HBASE-17623.v0.patch, HBASE-17623.v1.patch, 
> HBASE-17623.v1.patch, HBASE-17623.v2.patch, memory allocation measurement.xlsx
>
>
> There are two improvements.
> # The onDiskBlockBytesWithHeader should maintain a bytes array which can be 
> reused when building the hfile.
> # The onDiskBlockBytesWithHeader is copied to an new bytes array only when we 
> need to cache the block.
> # If no block need to be cached, the uncompressedBlockBytesWithHeader will 
> never be created.
> {code:title=HFileBlock.java|borderStyle=solid}
> private void finishBlock() throws IOException {
>   if (blockType == BlockType.DATA) {
> this.dataBlockEncoder.endBlockEncoding(dataBlockEncodingCtx, 
> userDataStream,
> baosInMemory.getBuffer(), blockType);
> blockType = dataBlockEncodingCtx.getBlockType();
>   }
>   userDataStream.flush();
>   // This does an array copy, so it is safe to cache this byte array when 
> cache-on-write.
>   // Header is still the empty, 'dummy' header that is yet to be filled 
> out.
>   uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   prevOffset = prevOffsetByType[blockType.getId()];
>   // We need to set state before we can package the block up for 
> cache-on-write. In a way, the
>   // block is ready, but not yet encoded or compressed.
>   state = State.BLOCK_READY;
>   if (blockType == BlockType.DATA || blockType == BlockType.ENCODED_DATA) 
> {
> onDiskBlockBytesWithHeader = dataBlockEncodingCtx.
> compressAndEncrypt(uncompressedBlockBytesWithHeader);
>   } else {
> onDiskBlockBytesWithHeader = defaultBlockEncodingCtx.
> compressAndEncrypt(uncompressedBlockBytesWithHeader);
>   }
>   // Calculate how many bytes we need for checksum on the tail of the 
> block.
>   int numBytes = (int) ChecksumUtil.numBytes(
>   onDiskBlockBytesWithHeader.length,
>   fileContext.getBytesPerChecksum());
>   // Put the header for the on disk bytes; header currently is 
> unfilled-out
>   putHeader(onDiskBlockBytesWithHeader, 0,
>   onDiskBlockBytesWithHeader.length + numBytes,
>   uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>   // Set the header for the uncompressed bytes (for cache-on-write) -- 
> IFF different from
>   // onDiskBlockBytesWithHeader array.
>   if (onDiskBlockBytesWithHeader != uncompressedBlockBytesWithHeader) {
> putHeader(uncompressedBlockBytesWithHeader, 0,
>   onDiskBlockBytesWithHeader.length + numBytes,
>   uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>   }
>   if (onDiskChecksum.length != numBytes) {
> onDiskChecksum = new byte[numBytes];
>   }
>   ChecksumUtil.generateChecksums(
>   onDiskBlockBytesWithHeader, 0, onDiskBlockBytesWithHeader.length,
>   onDiskChecksum, 0, fileContext.getChecksumType(), 
> fileContext.getBytesPerChecksum());
> }{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17718) Difference between RS's servername and its ephemeral node cause SSH stop working

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893823#comment-15893823
 ] 

stack commented on HBASE-17718:
---

Looking at this now [~allan163]. Yes, revert of HBASE-9593. In the old days, 
there was all kinds of mess possible when RS and Master did not agree on naming 
so we just let master be in charge of how servers are named in a cluster. 
HBASE-9593 mangled this.  HBASE-13753 was supposed to fix it but was just left 
sit.

But I'd like to put in place an alternate solution for the problem reported by 
HBASE-9593. It is a real possibility. The heartbeat registers the server but we 
need the evaporation of the znode for the server to be removed from online list 
-- and if we fail to write the znode post the heartbeat that reports-for-duty, 
the removal never happens (My 'There is also something odd...' is actually 
incorrect on reexamination).

I'll be backHBASE-9593 has a nice test that I can reuse.

> Difference between RS's servername and its ephemeral node cause SSH stop 
> working
> 
>
> Key: HBASE-17718
> URL: https://issues.apache.org/jira/browse/HBASE-17718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.2.4, 1.1.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>
> After HBASE-9593, RS put up an ephemeral node in ZK before reporting for 
> duty. But if the hosts config (/etc/hosts) is different between master and 
> RS, RS's serverName can be different from the one stored the ephemeral zk 
> node. The email metioned in HBASE-13753 
> (http://mail-archives.apache.org/mod_mbox/hbase-user/201505.mbox/%3CCANZDn9ueFEEuZMx=pZdmtLsdGLyZz=rrm1N6EQvLswYc1z-H=g...@mail.gmail.com%3E)
>  is exactly what happened in our production env. 
> But what the email didn't point out is that the difference between serverName 
> in RS and zk node can cause SSH stop to work. as we can see from the code in 
> {{RegionServerTracker}}
> {code}
>   @Override
>   public void nodeDeleted(String path) {
> if (path.startsWith(watcher.rsZNode)) {
>   String serverName = ZKUtil.getNodeName(path);
>   LOG.info("RegionServer ephemeral node deleted, processing expiration [" 
> +
> serverName + "]");
>   ServerName sn = ServerName.parseServerName(serverName);
>   if (!serverManager.isServerOnline(sn)) {
> LOG.warn(serverName.toString() + " is not online or isn't known to 
> the master."+
>  "The latter could be caused by a DNS misconfiguration.");
> return;
>   }
>   remove(sn);
>   this.serverManager.expireServer(sn);
> }
>   }
> {code}
> The server will not be processed by SSH/ServerCrashProcedure. The regions on 
> this server will not been assigned again until master restart or failover.
> I know HBASE-9593 was to fix the issue if RS report to duty and crashed 
> before it can put up a zk node. It is a very rare case(And controllable, just 
> fix the bug making rs to crash). But The issue I metioned can happened more 
> often(and uncontrollable, can't be fixed in HBase, due to DNS, hosts config, 
> etc.) and have more severe consequence.
> So here I offer some solutions to discuss:
> 1. Revert HBASE-9593 from all branches, Andrew Purtell has reverted it in 
> branch-0.98
> 2. Abort RS if master return a different name, otherwise SSH can't work 
> properly
> 3. Master accepts whatever servername reported by RS and don't change it.
> 4.correct the zk node if master return another name( idea from Ted Yu)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Lars George (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893808#comment-15893808
 ] 

Lars George commented on HBASE-17722:
-

+1 on setting those lower, they are annoying as heck. We should ideally vet log 
output changes while reviewing patches, so as to not even have to do this 
afterwards, like now? 

> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging
> --
>
> Key: HBASE-17722
> URL: https://issues.apache.org/jira/browse/HBASE-17722
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.3.0, 1.2.4
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Trivial
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17722.patch
>
>
> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging. Say you are collecting logs from a fleet of thousands of 
> servers and want to have them around for ~month or longer. It adds up. 
> I think these should at least be at DEBUG level and ideally at TRACE. They 
> don't offer much utility. Unfortunately they are Hadoop classes so we can 
> tweak log4j.properties defaults instead. We do this in test resources but not 
> in what we ship in conf/ . 
> {noformat}
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system started
>  INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics 
> system...
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.
>  INFO  [] impl.MetricsConfig: loaded properties from 
> hadoop-metrics2-hbase.properties
>  INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17338) Treat Cell data size under global memstore heap size only when that Cell can not be copied to MSLAB

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893773#comment-15893773
 ] 

stack commented on HBASE-17338:
---

Thanks boss. You think the doc is up-to-date w/ this state of things? 
https://docs.google.com/document/d/1fj5P8JeutQ-Uadb29ChDscMuMaJqaMNRI86C4k5S1rQ/edit#heading=h.x14v1a3zw2q9
 Thanks.

> Treat Cell data size under global memstore heap size only when that Cell can 
> not be copied to MSLAB
> ---
>
> Key: HBASE-17338
> URL: https://issues.apache.org/jira/browse/HBASE-17338
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-17338.patch, HBASE-17338_V2.patch, 
> HBASE-17338_V2.patch, HBASE-17338_V4.patch, HBASE-17338_V5.patch
>
>
> We have only data size and heap overhead being tracked globally.  Off heap 
> memstore works with off heap backed MSLAB pool.  But a cell, when added to 
> memstore, not always getting copied to MSLAB.  Append/Increment ops doing an 
> upsert, dont use MSLAB.  Also based on the Cell size, we sometimes avoid 
> MSLAB copy.  But now we track these cell data size also under the global 
> memstore data size which indicated off heap size in case of off heap 
> memstore.  For global checks for flushes (against lower/upper watermark 
> levels), we check this size against max off heap memstore size.  We do check 
> heap overhead against global heap memstore size (Defaults to 40% of xmx)  But 
> for such cells the data size also should be accounted under the heap overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17338) Treat Cell data size under global memstore heap size only when that Cell can not be copied to MSLAB

2017-03-02 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893768#comment-15893768
 ] 

Anoop Sam John commented on HBASE-17338:


bq.Thanks. So, datasize vs heapsize, have we written these up? datasize is 
serialized size... whether serialized on rpc or in blockcache and heapsize is 
how large a 'live' Cell in java heap is? If the Cell data is offheap, it is the 
the onheap Cell proxy object?
Yes
bq.How does the global threshold work in the case where we are doing offheap 
accounting too. The global check will look at the onheap limit and the offheap 
limit and if we hit the offheap limit before we hit the onheap limit, we'll 
flush?
In case of off heap MSLAB in place, the global memstore size is specified using 
a new config as u know.  So we will have this check against the aggregated 
dataSize .  Also we will be having a max possible on heap memstore global size 
(Defualt 40% of xmx)  This check also done. Any one of the condition met, we 
will do some forced flush. (What we do now also) with blocking writes.
bq.Should be easy enough to do?
Should be IMO

> Treat Cell data size under global memstore heap size only when that Cell can 
> not be copied to MSLAB
> ---
>
> Key: HBASE-17338
> URL: https://issues.apache.org/jira/browse/HBASE-17338
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-17338.patch, HBASE-17338_V2.patch, 
> HBASE-17338_V2.patch, HBASE-17338_V4.patch, HBASE-17338_V5.patch
>
>
> We have only data size and heap overhead being tracked globally.  Off heap 
> memstore works with off heap backed MSLAB pool.  But a cell, when added to 
> memstore, not always getting copied to MSLAB.  Append/Increment ops doing an 
> upsert, dont use MSLAB.  Also based on the Cell size, we sometimes avoid 
> MSLAB copy.  But now we track these cell data size also under the global 
> memstore data size which indicated off heap size in case of off heap 
> memstore.  For global checks for flushes (against lower/upper watermark 
> levels), we check this size against max off heap memstore size.  We do check 
> heap overhead against global heap memstore size (Defaults to 40% of xmx)  But 
> for such cells the data size also should be accounted under the heap overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15314) Allow more than one backing file in bucketcache

2017-03-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893741#comment-15893741
 ] 

ramkrishna.s.vasudevan commented on HBASE-15314:


[~aartokhy], [~zyork]
What do you feel about the latest patch in rb? If you are all fine, then shall 
we go ahead with [~zjushch]'s patch?

> Allow more than one backing file in bucketcache
> ---
>
> Key: HBASE-15314
> URL: https://issues.apache.org/jira/browse/HBASE-15314
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>Assignee: Aaron Tokhy
> Attachments: FileIOEngine.java, HBASE-15314.master.001.patch, 
> HBASE-15314.master.001.patch, HBASE-15314.patch, HBASE-15314-v2.patch, 
> HBASE-15314-v3.patch, HBASE-15314-v4.patch, HBASE-15314-v5.patch
>
>
> Allow bucketcache use more than just one backing file: e.g. chassis has more 
> than one SSD in it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17623) Reuse the bytes array when building the hfile block

2017-03-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893739#comment-15893739
 ] 

ramkrishna.s.vasudevan commented on HBASE-17623:


Am +1. [~anoop.hbase] if you are fine - you can commit it?

> Reuse the bytes array when building the hfile block
> ---
>
> Key: HBASE-17623
> URL: https://issues.apache.org/jira/browse/HBASE-17623
> Project: HBase
>  Issue Type: Improvement
>Reporter: CHIA-PING TSAI
>Assignee: CHIA-PING TSAI
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: after(snappy_hfilesize=5.04GB).png, 
> after(snappy_hfilesize=755MB).png, before(snappy_hfilesize=5.04GB).png, 
> before(snappy_hfilesize=755MB).png, HBASE-17623.branch-1.v0.patch, 
> HBASE-17623.branch-1.v1.patch, HBASE-17623.v0.patch, HBASE-17623.v1.patch, 
> HBASE-17623.v1.patch, HBASE-17623.v2.patch, memory allocation measurement.xlsx
>
>
> There are two improvements.
> # The onDiskBlockBytesWithHeader should maintain a bytes array which can be 
> reused when building the hfile.
> # The onDiskBlockBytesWithHeader is copied to an new bytes array only when we 
> need to cache the block.
> # If no block need to be cached, the uncompressedBlockBytesWithHeader will 
> never be created.
> {code:title=HFileBlock.java|borderStyle=solid}
> private void finishBlock() throws IOException {
>   if (blockType == BlockType.DATA) {
> this.dataBlockEncoder.endBlockEncoding(dataBlockEncodingCtx, 
> userDataStream,
> baosInMemory.getBuffer(), blockType);
> blockType = dataBlockEncodingCtx.getBlockType();
>   }
>   userDataStream.flush();
>   // This does an array copy, so it is safe to cache this byte array when 
> cache-on-write.
>   // Header is still the empty, 'dummy' header that is yet to be filled 
> out.
>   uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   prevOffset = prevOffsetByType[blockType.getId()];
>   // We need to set state before we can package the block up for 
> cache-on-write. In a way, the
>   // block is ready, but not yet encoded or compressed.
>   state = State.BLOCK_READY;
>   if (blockType == BlockType.DATA || blockType == BlockType.ENCODED_DATA) 
> {
> onDiskBlockBytesWithHeader = dataBlockEncodingCtx.
> compressAndEncrypt(uncompressedBlockBytesWithHeader);
>   } else {
> onDiskBlockBytesWithHeader = defaultBlockEncodingCtx.
> compressAndEncrypt(uncompressedBlockBytesWithHeader);
>   }
>   // Calculate how many bytes we need for checksum on the tail of the 
> block.
>   int numBytes = (int) ChecksumUtil.numBytes(
>   onDiskBlockBytesWithHeader.length,
>   fileContext.getBytesPerChecksum());
>   // Put the header for the on disk bytes; header currently is 
> unfilled-out
>   putHeader(onDiskBlockBytesWithHeader, 0,
>   onDiskBlockBytesWithHeader.length + numBytes,
>   uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>   // Set the header for the uncompressed bytes (for cache-on-write) -- 
> IFF different from
>   // onDiskBlockBytesWithHeader array.
>   if (onDiskBlockBytesWithHeader != uncompressedBlockBytesWithHeader) {
> putHeader(uncompressedBlockBytesWithHeader, 0,
>   onDiskBlockBytesWithHeader.length + numBytes,
>   uncompressedBlockBytesWithHeader.length, 
> onDiskBlockBytesWithHeader.length);
>   }
>   if (onDiskChecksum.length != numBytes) {
> onDiskChecksum = new byte[numBytes];
>   }
>   ChecksumUtil.generateChecksums(
>   onDiskBlockBytesWithHeader, 0, onDiskBlockBytesWithHeader.length,
>   onDiskChecksum, 0, fileContext.getChecksumType(), 
> fileContext.getBytesPerChecksum());
> }{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17680) Run mini cluster through JNI in tests

2017-03-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893716#comment-15893716
 ] 

Ted Yu commented on HBASE-17680:


Ran valgrind again.
libjvm.so still showed up but there was no occurrence of MiniCluster.
This was from last run:
{code}
==21686==by 0x426260: hbase::MiniCluster::CreateVM(JavaVM_**) 
(mini-cluster.cc:73)
==21686==by 0x426B5A: hbase::MiniCluster::Setup() (mini-cluster.cc:115)
{code}

> Run mini cluster through JNI in tests
> -
>
> Key: HBASE-17680
> URL: https://issues.apache.org/jira/browse/HBASE-17680
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 17680.v14.txt, 17680.v17.txt, 17680.v18.txt, 
> 17680.v1.txt, 17680.v20.txt, 17680.v22.txt, 17680.v23.txt, 17680.v26.txt, 
> 17680.v27.txt, 17680.v3.txt, 17680.v8.txt
>
>
> Currently tests start local hbase cluster through hbase shell.
> There is less control over the configuration of the local cluster this way.
> This issue would replace hbase shell with JNI interface to mini cluster.
> We would have full control over the cluster behavior.
> Thanks to [~devaraj] who started this initiative.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17361) Make HTable thread safe

2017-03-02 Thread CHIA-PING TSAI (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893712#comment-15893712
 ] 

CHIA-PING TSAI commented on HBASE-17361:


The BM#flush waits all PUTs to be accomplished. If we share the HTable between 
threads, the HTable#put needs to wait for all PUTs which are created by 
different threads. The logic of BM#flush is good, but the put operation will be 
slow. May be we can use the AP rather than BF in the HTable#put? If yes, it 
will be easier to make HTable thread-safe.

> Make HTable thread safe
> ---
>
> Key: HBASE-17361
> URL: https://issues.apache.org/jira/browse/HBASE-17361
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Critical
> Attachments: HBASE-17361.patch, HBASE-17361.patch
>
>
> Currently HTable is marked as NOT thread safe, and this JIRA target at 
> improving this to take better usage of the thread-safe BufferedMutator.
> Some findings/work done:
> If we try to do put to the same HTable instance in parallel, there'll be 
> problem, since now we have {{HTable#getBufferedMutator}} like
> {code}
>BufferedMutator getBufferedMutator() throws IOException {
>  if (mutator == null) {
>   this.mutator = (BufferedMutatorImpl) connection.getBufferedMutator(
>   new BufferedMutatorParams(tableName)
>   .pool(pool)
>   .writeBufferSize(connConfiguration.getWriteBufferSize())
>   .maxKeyValueSize(connConfiguration.getMaxKeyValueSize())
>   );
> }
> mutator.setRpcTimeout(writeRpcTimeout);
> mutator.setOperationTimeout(operationTimeout);
> return mutator;
>   }
> {code}
> And {{HTable#flushCommits}}:
> {code}
>   void flushCommits() throws IOException {
> if (mutator == null) {
>   // nothing to flush if there's no mutator; don't bother creating one.
>   return;
> }
> getBufferedMutator().flush();
>   }
> {code}
> For {{HTable#put}}
> {code}
>   public void put(final Put put) throws IOException {
> getBufferedMutator().mutate(put);
> flushCommits();
>   }
> {code}
> If we launch multiple threads to put in parallel, below sequence might happen 
> because {{HTable#getBufferedMutator}} is not thread safe:
> {noformat}
> 1. ThreadA runs to getBufferedMutator and finds mutator==null
> 2. ThreadB runs to getBufferedMutator and finds mutator==null
> 3. ThreadA initialize mutator to instanceA, then calls mutator#mutate,
> adding one put (putA) into {{writeAsyncBuffer}}
> 4. ThreadB initialize mutator to instanceB
> 5. ThreadA runs to flushCommits, now mutator is instanceB, it calls
> instanceB's flush method, putA is lost
> {noformat}
> After fixing this, we will find quite some contention on 
> {{BufferedMutatorImpl#flush}}, so more efforts required to make HTable thread 
> safe but with good performance meanwhile.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17680) Run mini cluster through JNI in tests

2017-03-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17680:
---
Attachment: 17680.v27.txt

Patch v27 drops unneeded jobject return type.

RunShellCmd() is also removed since it is no longer used.

All tests pass.

> Run mini cluster through JNI in tests
> -
>
> Key: HBASE-17680
> URL: https://issues.apache.org/jira/browse/HBASE-17680
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 17680.v14.txt, 17680.v17.txt, 17680.v18.txt, 
> 17680.v1.txt, 17680.v20.txt, 17680.v22.txt, 17680.v23.txt, 17680.v26.txt, 
> 17680.v27.txt, 17680.v3.txt, 17680.v8.txt
>
>
> Currently tests start local hbase cluster through hbase shell.
> There is less control over the configuration of the local cluster this way.
> This issue would replace hbase shell with JNI interface to mini cluster.
> We would have full control over the cluster behavior.
> Thanks to [~devaraj] who started this initiative.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-17718) Difference between RS's servername and its ephemeral node cause SSH stop working

2017-03-02 Thread Allan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893694#comment-15893694
 ] 

Allan Yang edited comment on HBASE-17718 at 3/3/17 4:08 AM:


So you suggest we revert HBASE-9593, [~stack]? Do you need me upload a patch or 
just revert from your side? If so, please go ahead and help me resolve this 
issue. Thank you, sir!


was (Author: allan163):
So you suggest we revert HBASE-9593, [~stack]? If so, please go ahead and help 
me resolve this issue. Thank you, sir!

> Difference between RS's servername and its ephemeral node cause SSH stop 
> working
> 
>
> Key: HBASE-17718
> URL: https://issues.apache.org/jira/browse/HBASE-17718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.2.4, 1.1.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>
> After HBASE-9593, RS put up an ephemeral node in ZK before reporting for 
> duty. But if the hosts config (/etc/hosts) is different between master and 
> RS, RS's serverName can be different from the one stored the ephemeral zk 
> node. The email metioned in HBASE-13753 
> (http://mail-archives.apache.org/mod_mbox/hbase-user/201505.mbox/%3CCANZDn9ueFEEuZMx=pZdmtLsdGLyZz=rrm1N6EQvLswYc1z-H=g...@mail.gmail.com%3E)
>  is exactly what happened in our production env. 
> But what the email didn't point out is that the difference between serverName 
> in RS and zk node can cause SSH stop to work. as we can see from the code in 
> {{RegionServerTracker}}
> {code}
>   @Override
>   public void nodeDeleted(String path) {
> if (path.startsWith(watcher.rsZNode)) {
>   String serverName = ZKUtil.getNodeName(path);
>   LOG.info("RegionServer ephemeral node deleted, processing expiration [" 
> +
> serverName + "]");
>   ServerName sn = ServerName.parseServerName(serverName);
>   if (!serverManager.isServerOnline(sn)) {
> LOG.warn(serverName.toString() + " is not online or isn't known to 
> the master."+
>  "The latter could be caused by a DNS misconfiguration.");
> return;
>   }
>   remove(sn);
>   this.serverManager.expireServer(sn);
> }
>   }
> {code}
> The server will not be processed by SSH/ServerCrashProcedure. The regions on 
> this server will not been assigned again until master restart or failover.
> I know HBASE-9593 was to fix the issue if RS report to duty and crashed 
> before it can put up a zk node. It is a very rare case(And controllable, just 
> fix the bug making rs to crash). But The issue I metioned can happened more 
> often(and uncontrollable, can't be fixed in HBase, due to DNS, hosts config, 
> etc.) and have more severe consequence.
> So here I offer some solutions to discuss:
> 1. Revert HBASE-9593 from all branches, Andrew Purtell has reverted it in 
> branch-0.98
> 2. Abort RS if master return a different name, otherwise SSH can't work 
> properly
> 3. Master accepts whatever servername reported by RS and don't change it.
> 4.correct the zk node if master return another name( idea from Ted Yu)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893698#comment-15893698
 ] 

Andrew Purtell commented on HBASE-17722:


Sure, will do that, then commit 

> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging
> --
>
> Key: HBASE-17722
> URL: https://issues.apache.org/jira/browse/HBASE-17722
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.3.0, 1.2.4
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Trivial
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17722.patch
>
>
> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging. Say you are collecting logs from a fleet of thousands of 
> servers and want to have them around for ~month or longer. It adds up. 
> I think these should at least be at DEBUG level and ideally at TRACE. They 
> don't offer much utility. Unfortunately they are Hadoop classes so we can 
> tweak log4j.properties defaults instead. We do this in test resources but not 
> in what we ship in conf/ . 
> {noformat}
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system started
>  INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics 
> system...
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.
>  INFO  [] impl.MetricsConfig: loaded properties from 
> hadoop-metrics2-hbase.properties
>  INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17718) Difference between RS's servername and its ephemeral node cause SSH stop working

2017-03-02 Thread Allan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893694#comment-15893694
 ] 

Allan Yang commented on HBASE-17718:


So you suggest we revert HBASE-9593, [~stack]? If so, please go ahead and help 
me resolve this issue. Thank you, sir!

> Difference between RS's servername and its ephemeral node cause SSH stop 
> working
> 
>
> Key: HBASE-17718
> URL: https://issues.apache.org/jira/browse/HBASE-17718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.2.4, 1.1.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>
> After HBASE-9593, RS put up an ephemeral node in ZK before reporting for 
> duty. But if the hosts config (/etc/hosts) is different between master and 
> RS, RS's serverName can be different from the one stored the ephemeral zk 
> node. The email metioned in HBASE-13753 
> (http://mail-archives.apache.org/mod_mbox/hbase-user/201505.mbox/%3CCANZDn9ueFEEuZMx=pZdmtLsdGLyZz=rrm1N6EQvLswYc1z-H=g...@mail.gmail.com%3E)
>  is exactly what happened in our production env. 
> But what the email didn't point out is that the difference between serverName 
> in RS and zk node can cause SSH stop to work. as we can see from the code in 
> {{RegionServerTracker}}
> {code}
>   @Override
>   public void nodeDeleted(String path) {
> if (path.startsWith(watcher.rsZNode)) {
>   String serverName = ZKUtil.getNodeName(path);
>   LOG.info("RegionServer ephemeral node deleted, processing expiration [" 
> +
> serverName + "]");
>   ServerName sn = ServerName.parseServerName(serverName);
>   if (!serverManager.isServerOnline(sn)) {
> LOG.warn(serverName.toString() + " is not online or isn't known to 
> the master."+
>  "The latter could be caused by a DNS misconfiguration.");
> return;
>   }
>   remove(sn);
>   this.serverManager.expireServer(sn);
> }
>   }
> {code}
> The server will not be processed by SSH/ServerCrashProcedure. The regions on 
> this server will not been assigned again until master restart or failover.
> I know HBASE-9593 was to fix the issue if RS report to duty and crashed 
> before it can put up a zk node. It is a very rare case(And controllable, just 
> fix the bug making rs to crash). But The issue I metioned can happened more 
> often(and uncontrollable, can't be fixed in HBase, due to DNS, hosts config, 
> etc.) and have more severe consequence.
> So here I offer some solutions to discuss:
> 1. Revert HBASE-9593 from all branches, Andrew Purtell has reverted it in 
> branch-0.98
> 2. Abort RS if master return a different name, otherwise SSH can't work 
> properly
> 3. Master accepts whatever servername reported by RS and don't change it.
> 4.correct the zk node if master return another name( idea from Ted Yu)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893688#comment-15893688
 ] 

Jerry He commented on HBASE-17722:
--

I too noticed these messages showing up in recent version of 1.2.x

{noformat}
2017-02-27 17:21:00,974 INFO  [HBase-Metrics2-1] impl.MetricsSinkAdapter: Sink 
timeline started
2017-02-27 17:21:00,976 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: 
Scheduled snapshot period at 10 second(s).
2017-02-27 17:21:00,976 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase 
metrics system started
{noformat}
{noformat}
2017-02-27 17:36:00,449 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: 
Stopping HBase metrics system...
2017-02-27 17:36:00,450 INFO  [timeline] impl.MetricsSinkAdapter: timeline 
thread interrupted.
2017-02-27 17:36:00,455 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase 
metrics system stopped.
{noformat}

Do you want to add impl.MetricsSinkAdapter?  Do you want to put it as 'WARN'?
Otherwise  +1

> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging
> --
>
> Key: HBASE-17722
> URL: https://issues.apache.org/jira/browse/HBASE-17722
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.3.0, 1.2.4
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Trivial
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17722.patch
>
>
> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging. Say you are collecting logs from a fleet of thousands of 
> servers and want to have them around for ~month or longer. It adds up. 
> I think these should at least be at DEBUG level and ideally at TRACE. They 
> don't offer much utility. Unfortunately they are Hadoop classes so we can 
> tweak log4j.properties defaults instead. We do this in test resources but not 
> in what we ship in conf/ . 
> {noformat}
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system started
>  INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics 
> system...
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.
>  INFO  [] impl.MetricsConfig: loaded properties from 
> hadoop-metrics2-hbase.properties
>  INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17721) Provide streaming APIs with SSL/TLS

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893675#comment-15893675
 ] 

stack commented on HBASE-17721:
---

We can't put up grpc on our rpc port. We want hbase-1.x clients able to talk to 
hbase-2.0.0.

Guava is a pain, yes, but we have a set of libs of this type -- i.e. critical 
libs that we can't do without and currently can't update beyond the creaky old 
versions that hadoop runs -- but we have to have fix general issue this so 
wouldn't worry about it. One concern is that grpc needs pb3. HBase is pb2.5 
(though we run a shaded pb3 internally).

Maybe the way to go is just put up a new port for now so you can play (which is 
sort of what HBASE-8691 did). IIRC cockroachdb tried to be smart and run with 
one port only for webui and data but had to give up on it (don't have my hand 
on the blog just now). We'll probably need to do the same.

Yeah, playing w/ HBASE-13467 would be a good approach. The original patch is by 
the grpc fellows. They might show up again if they see progress there.

I love this issue (again). Smile.

> Provide streaming APIs with SSL/TLS
> ---
>
> Key: HBASE-17721
> URL: https://issues.apache.org/jira/browse/HBASE-17721
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Alex Araujo
>Assignee: Alex Araujo
> Fix For: 2.0.0
>
>
> Umbrella to add optional client/server streaming capabilities to HBase.
> This would allow bandwidth to be used more efficiently for certain 
> operations, and allow clients to use SSL/TLS for authentication and 
> encryption.
> Desired client/server scaffolding:
> - HTTP/2 support
> - Protocol negotiation (blocking vs streaming, auth, encryption, etc.)
> - TLS/SSL support
> - Streaming RPC support
> Possibilities (and their tradeoffs):
> - gRPC: Some initial work and discussion on HBASE-13467 (Prototype using GRPC 
> as IPC mechanism)
> -- Has most or all of the desired scaffolding
> -- Adds additional g* dependencies. Compat story for g* dependencies not 
> always ideal
> - Custom HTTP/2 based client/server APIs
> -- More control over compat story
> -- Non-trivial to build scaffolding; might reinvent wheels along the way
> - Others?
> Related Jiras that might be rolled in as sub-tasks (or closed/replaced with 
> new ones):
> HBASE-17708 (Expose config to set two-way auth over TLS in HttpServer and add 
> a test)
> HBASE-8691 (High-Throughput Streaming Scan API)
> HBASE-14899 (Create custom Streaming ReplicationEndpoint)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17716) Formalize Scan Metric names

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893661#comment-15893661
 ] 

stack commented on HBASE-17716:
---

bq.  I am guessing the dump in JMX or metrics page is probably just some 
aggregated info so may not be useful for us.

Nah. This is our mechanism for publishing metrics. Phoenix could read the 
MBeans. That'd have it so you were relying on 'public' Interface rather than 
hbase internals.

We could add rules around metric evolution like those we have on our APIs so 
downstreamers could have something to rely on if that'd help.

bq. We would like to have these metric names as constants to provide users 
capability of looking up metrics of their choice via static metric names.
bq. It could very well be strings defined as public static final . 

This makes sense. We should do this for sure. If it inconvenient for you to get 
access, we should clean house. Looking, it looks like metrics are scattered all 
around. We could define the names as static final and then perhaps annotate 
them as metrics names so easy to find and so they'd be demarked as 'public', 
not for change?

bq. With enums though we can use an EnumMap which is more performant and 
compact as compared to a HashMap.

So, enums. Is the above the only advantage you see to enum'ing all of our 
metrics? Its minor, no? (EnumMap and HashMap both extend AbstractMap -- enum 
has advantage hashing since fixed set). You have some perf stats to go along w/ 
above? And would this be just for the way phoenix accesses the metrics or could 
hbase benefit too?

Thanks.





> Formalize Scan Metric names
> ---
>
> Key: HBASE-17716
> URL: https://issues.apache.org/jira/browse/HBASE-17716
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Reporter: Karan Mehta
>Assignee: Karan Mehta
>Priority: Minor
> Attachments: HBASE-17716.patch
>
>
> HBase provides various metrics through the API's exposed by ScanMetrics 
> class. 
> The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix 
> Metrics API. Currently these metrics are referred via hard-coded strings, 
> which are not formal and can break the Phoenix API. Hence we need to refactor 
> the code to assign enums for these metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17706) TableSkewCostFunction improperly computes max skew

2017-03-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893659#comment-15893659
 ] 

Ted Yu commented on HBASE-17706:


TestStochasticLoadBalancer2 still failed.

Can you rebase the patch (now that we have new TableSkewCostFunction) ?

> TableSkewCostFunction improperly computes max skew
> --
>
> Key: HBASE-17706
> URL: https://issues.apache.org/jira/browse/HBASE-17706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Priority: Minor
>  Labels: patch
> Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, 
> HBASE-17706.patch
>
>
> We noticed while running unit tests that the TableSkewCostFunction computed 
> cost did not change as the balancer ran and simulated moves across the 
> cluster. After investigating, we found that this happened in particular when 
> the cluster started out with at least one table very strongly skewed.
> We noticed that the TableSkewCostFunction depends on a field of the 
> BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field 
> is not properly maintained as regionMoves are simulated for the cluster. The 
> field only ever increases as the maximum number of regions per table 
> increases, but it does not decrease as the maximum number per table goes down.
> This patch corrects that behavior so that the field is accurately maintained, 
> and thus the TableSkewCostFunction produces a more correct value as the 
> balancer runs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17707:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17718) Difference between RS's servername and its ephemeral node cause SSH stop working

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893632#comment-15893632
 ] 

stack commented on HBASE-17718:
---

#1 because HBASE-9593 is wrong (registering in zk with local name rather than 
master-provided name BEFORE we've talked to the master to get what our cluster 
name is supposed to be).

There is also something odd about HBASE-9593 looking at it now again. A 
regionserver is being 'registered' via a reading of the zk data, not via 
heartbeat so it must be something like a master that has joined a cluster that 
is already up after a master crash. It looks like an extremely rare case where 
an ephemeral node has not evaporated yet and in the meantime a master crashes 
and then a backup master joins the cluster. Looks like we need a more rigorous 
accounting of cluster servers when a backup master joins a running cluster. It 
can read candidate servers by looking in zk but it should wait on a heartbeat 
before adding a candidate regionserver to its cluster set. We can open a new 
issue to do this so we prevent a version of HBASE-9593 arising again post 
revert.

> Difference between RS's servername and its ephemeral node cause SSH stop 
> working
> 
>
> Key: HBASE-17718
> URL: https://issues.apache.org/jira/browse/HBASE-17718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.2.4, 1.1.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>
> After HBASE-9593, RS put up an ephemeral node in ZK before reporting for 
> duty. But if the hosts config (/etc/hosts) is different between master and 
> RS, RS's serverName can be different from the one stored the ephemeral zk 
> node. The email metioned in HBASE-13753 
> (http://mail-archives.apache.org/mod_mbox/hbase-user/201505.mbox/%3CCANZDn9ueFEEuZMx=pZdmtLsdGLyZz=rrm1N6EQvLswYc1z-H=g...@mail.gmail.com%3E)
>  is exactly what happened in our production env. 
> But what the email didn't point out is that the difference between serverName 
> in RS and zk node can cause SSH stop to work. as we can see from the code in 
> {{RegionServerTracker}}
> {code}
>   @Override
>   public void nodeDeleted(String path) {
> if (path.startsWith(watcher.rsZNode)) {
>   String serverName = ZKUtil.getNodeName(path);
>   LOG.info("RegionServer ephemeral node deleted, processing expiration [" 
> +
> serverName + "]");
>   ServerName sn = ServerName.parseServerName(serverName);
>   if (!serverManager.isServerOnline(sn)) {
> LOG.warn(serverName.toString() + " is not online or isn't known to 
> the master."+
>  "The latter could be caused by a DNS misconfiguration.");
> return;
>   }
>   remove(sn);
>   this.serverManager.expireServer(sn);
> }
>   }
> {code}
> The server will not be processed by SSH/ServerCrashProcedure. The regions on 
> this server will not been assigned again until master restart or failover.
> I know HBASE-9593 was to fix the issue if RS report to duty and crashed 
> before it can put up a zk node. It is a very rare case(And controllable, just 
> fix the bug making rs to crash). But The issue I metioned can happened more 
> often(and uncontrollable, can't be fixed in HBase, due to DNS, hosts config, 
> etc.) and have more severe consequence.
> So here I offer some solutions to discuss:
> 1. Revert HBASE-9593 from all branches, Andrew Purtell has reverted it in 
> branch-0.98
> 2. Abort RS if master return a different name, otherwise SSH can't work 
> properly
> 3. Master accepts whatever servername reported by RS and don't change it.
> 4.correct the zk node if master return another name( idea from Ted Yu)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-17722:
---
Attachment: HBASE-17722.patch

> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging
> --
>
> Key: HBASE-17722
> URL: https://issues.apache.org/jira/browse/HBASE-17722
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.3.0, 1.2.4
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Trivial
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17722.patch
>
>
> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging. Say you are collecting logs from a fleet of thousands of 
> servers and want to have them around for ~month or longer. It adds up. 
> I think these should at least be at DEBUG level and ideally at TRACE. They 
> don't offer much utility. Unfortunately they are Hadoop classes so we can 
> tweak log4j.properties defaults instead. We do this in test resources but not 
> in what we ship in conf/ . 
> {noformat}
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system started
>  INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics 
> system...
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.
>  INFO  [] impl.MetricsConfig: loaded properties from 
> hadoop-metrics2-hbase.properties
>  INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-17722:
---
Attachment: (was: HBASE-17722.patch)

> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging
> --
>
> Key: HBASE-17722
> URL: https://issues.apache.org/jira/browse/HBASE-17722
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.3.0, 1.2.4
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Trivial
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17722.patch
>
>
> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging. Say you are collecting logs from a fleet of thousands of 
> servers and want to have them around for ~month or longer. It adds up. 
> I think these should at least be at DEBUG level and ideally at TRACE. They 
> don't offer much utility. Unfortunately they are Hadoop classes so we can 
> tweak log4j.properties defaults instead. We do this in test resources but not 
> in what we ship in conf/ . 
> {noformat}
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system started
>  INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics 
> system...
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.
>  INFO  [] impl.MetricsConfig: loaded properties from 
> hadoop-metrics2-hbase.properties
>  INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17716) Formalize Scan Metric names

2017-03-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893600#comment-15893600
 ] 

Samarth Jain commented on HBASE-17716:
--

In Phoenix, we have a framework where we collect metrics for every sql 
statement. The idea of PHOENIX-3248 is to include these scan metrics collected 
by HBase to provide information on how much work the scans are doing for a sql. 
I am guessing the dump in JMX or metrics page is probably just some aggregated 
info so may not be useful for us. 

bq. Is there a history of our randomly changing metric names out from under 
phoenix (other than at say key junctions such as a major release?). 
Well, this is the first time that we are exposing HBase scan metrics via 
phoenix. We would like to have these metric names as constants to provide users 
capability of looking up metrics of their choice via static metric names.

bq. And if enum'ing has a value, should we do it for all metrics rather than 
just a subset as here?
Enums are just convenient. It could very well be strings defined as public 
static final :). With enums though we can use an EnumMap which is more 
performant and compact as compared to a HashMap.



> Formalize Scan Metric names
> ---
>
> Key: HBASE-17716
> URL: https://issues.apache.org/jira/browse/HBASE-17716
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Reporter: Karan Mehta
>Assignee: Karan Mehta
>Priority: Minor
> Attachments: HBASE-17716.patch
>
>
> HBase provides various metrics through the API's exposed by ScanMetrics 
> class. 
> The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix 
> Metrics API. Currently these metrics are referred via hard-coded strings, 
> which are not formal and can break the Phoenix API. Hence we need to refactor 
> the code to assign enums for these metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-17579) Backport HBASE-16302 to 1.3.1

2017-03-02 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893578#comment-15893578
 ] 

Gary Helmling edited comment on HBASE-17579 at 3/3/17 2:20 AM:
---

You also need to change gaugesMap from Map to ConcurrentMap, as 
Map.putIfAbsent() is only present in Java 8 and HBase 1.3 supports both 7 & 8.  
That's the reason for the compilation failure in the 1.7 builds above.


was (Author: ghelmling):
You also need to change gaugesMap from Map to ConcurrentMap, as 
Map.putIfAbsent() is only present in Java 8 and HBase 1.3 supports both 7 & 8.

> Backport HBASE-16302 to 1.3.1
> -
>
> Key: HBASE-17579
> URL: https://issues.apache.org/jira/browse/HBASE-17579
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Fix For: 1.3.1
>
> Attachments: HBASE-17579.branch-1.3.001.patch, 
> HBASE-17579.branch-1.3.002.patch
>
>
> This is a simple enough change to be included in 1.3.1, and replication 
> monitoring essentially breaks without this change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-17722:
---
Fix Version/s: 1.4.0
   2.0.0

> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging
> --
>
> Key: HBASE-17722
> URL: https://issues.apache.org/jira/browse/HBASE-17722
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.3.0, 1.2.4
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Trivial
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17722.patch
>
>
> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging. Say you are collecting logs from a fleet of thousands of 
> servers and want to have them around for ~month or longer. It adds up. 
> I think these should at least be at DEBUG level and ideally at TRACE. They 
> don't offer much utility. Unfortunately they are Hadoop classes so we can 
> tweak log4j.properties defaults instead. We do this in test resources but not 
> in what we ship in conf/ . 
> {noformat}
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system started
>  INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics 
> system...
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.
>  INFO  [] impl.MetricsConfig: loaded properties from 
> hadoop-metrics2-hbase.properties
>  INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned HBASE-17722:
--

Assignee: Andrew Purtell
Priority: Trivial  (was: Major)

> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging
> --
>
> Key: HBASE-17722
> URL: https://issues.apache.org/jira/browse/HBASE-17722
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.3.0, 1.2.4
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Trivial
> Attachments: HBASE-17722.patch
>
>
> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging. Say you are collecting logs from a fleet of thousands of 
> servers and want to have them around for ~month or longer. It adds up. 
> I think these should at least be at DEBUG level and ideally at TRACE. They 
> don't offer much utility. Unfortunately they are Hadoop classes so we can 
> tweak log4j.properties defaults instead. We do this in test resources but not 
> in what we ship in conf/ . 
> {noformat}
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system started
>  INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics 
> system...
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.
>  INFO  [] impl.MetricsConfig: loaded properties from 
> hadoop-metrics2-hbase.properties
>  INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-17722:
---
Status: Patch Available  (was: Open)

> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging
> --
>
> Key: HBASE-17722
> URL: https://issues.apache.org/jira/browse/HBASE-17722
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.2.4, 1.3.0
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Trivial
> Attachments: HBASE-17722.patch
>
>
> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging. Say you are collecting logs from a fleet of thousands of 
> servers and want to have them around for ~month or longer. It adds up. 
> I think these should at least be at DEBUG level and ideally at TRACE. They 
> don't offer much utility. Unfortunately they are Hadoop classes so we can 
> tweak log4j.properties defaults instead. We do this in test resources but not 
> in what we ship in conf/ . 
> {noformat}
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system started
>  INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics 
> system...
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.
>  INFO  [] impl.MetricsConfig: loaded properties from 
> hadoop-metrics2-hbase.properties
>  INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-17722:
---
Attachment: HBASE-17722.patch

Trivial patch

> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging
> --
>
> Key: HBASE-17722
> URL: https://issues.apache.org/jira/browse/HBASE-17722
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.3.0, 1.2.4
>Reporter: Andrew Purtell
> Attachments: HBASE-17722.patch
>
>
> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging. Say you are collecting logs from a fleet of thousands of 
> servers and want to have them around for ~month or longer. It adds up. 
> I think these should at least be at DEBUG level and ideally at TRACE. They 
> don't offer much utility. Unfortunately they are Hadoop classes so we can 
> tweak log4j.properties defaults instead. We do this in test resources but not 
> in what we ship in conf/ . 
> {noformat}
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system started
>  INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics 
> system...
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.
>  INFO  [] impl.MetricsConfig: loaded properties from 
> hadoop-metrics2-hbase.properties
>  INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893586#comment-15893586
 ] 

Hudson commented on HBASE-17717:


SUCCESS: Integrated in Jenkins build HBase-1.4 #655 (See 
[https://builds.apache.org/job/HBase-1.4/655/])
HBASE-17717 Explicitly use "sasl" ACL scheme for hbase superuser (elserj: rev 
85b5d493152f61ba159a7ce61c9eb679804e17c9)
* (edit) 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKUtil.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java


> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.0.98.patch, 
> HBASE-17717.001.branch-1.1.patch, HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-17722:
---
Description: 
Metrics subsystem stop/start messages add a lot of useless bulk to operational 
logging. Say you are collecting logs from a fleet of thousands of servers and 
want to have them around for ~month or longer. It adds up. 

I think these should at least be at DEBUG level and ideally at TRACE. They 
don't offer much utility. Unfortunately they are Hadoop classes so we can tweak 
log4j.properties defaults instead. We do this in test resources but not in what 
we ship in conf/ . 

{noformat}
 INFO  [] impl.MetricsSystemImpl: HBase metrics system started

 INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics system...

 INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.

 INFO  [] impl.MetricsConfig: loaded properties from 
hadoop-metrics2-hbase.properties

 INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 10 
second(s).
{noformat}


  was:
Metrics subsystem stop/start messages add a lot of useless bulk to operational 
logging. Say you are collecting logs from a fleet of thousands of servers and 
want to have them around for ~month or longer. It adds up. 

I think these should at least be at DEBUG level and ideally at TRACE. They 
don't offer much utility.

{noformat}
 INFO  [] impl.MetricsSystemImpl: HBase metrics system started

 INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics system...

 INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.

 INFO  [] impl.MetricsConfig: loaded properties from 
hadoop-metrics2-hbase.properties

 INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 10 
second(s).
{noformat}



> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging
> --
>
> Key: HBASE-17722
> URL: https://issues.apache.org/jira/browse/HBASE-17722
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.3.0, 1.2.4
>Reporter: Andrew Purtell
>
> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging. Say you are collecting logs from a fleet of thousands of 
> servers and want to have them around for ~month or longer. It adds up. 
> I think these should at least be at DEBUG level and ideally at TRACE. They 
> don't offer much utility. Unfortunately they are Hadoop classes so we can 
> tweak log4j.properties defaults instead. We do this in test resources but not 
> in what we ship in conf/ . 
> {noformat}
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system started
>  INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics 
> system...
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.
>  INFO  [] impl.MetricsConfig: loaded properties from 
> hadoop-metrics2-hbase.properties
>  INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893581#comment-15893581
 ] 

Andrew Purtell commented on HBASE-17722:


This should just be an update to log4.properties settings. Just a sec.

> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging
> --
>
> Key: HBASE-17722
> URL: https://issues.apache.org/jira/browse/HBASE-17722
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.3.0, 1.2.4
>Reporter: Andrew Purtell
>
> Metrics subsystem stop/start messages add a lot of useless bulk to 
> operational logging. Say you are collecting logs from a fleet of thousands of 
> servers and want to have them around for ~month or longer. It adds up. 
> I think these should at least be at DEBUG level and ideally at TRACE. They 
> don't offer much utility.
> {noformat}
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system started
>  INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics 
> system...
>  INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.
>  INFO  [] impl.MetricsConfig: loaded properties from 
> hadoop-metrics2-hbase.properties
>  INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17579) Backport HBASE-16302 to 1.3.1

2017-03-02 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893578#comment-15893578
 ] 

Gary Helmling commented on HBASE-17579:
---

You also need to change gaugesMap from Map to ConcurrentMap, as 
Map.putIfAbsent() is only present in Java 8 and HBase 1.3 supports both 7 & 8.

> Backport HBASE-16302 to 1.3.1
> -
>
> Key: HBASE-17579
> URL: https://issues.apache.org/jira/browse/HBASE-17579
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Fix For: 1.3.1
>
> Attachments: HBASE-17579.branch-1.3.001.patch, 
> HBASE-17579.branch-1.3.002.patch
>
>
> This is a simple enough change to be included in 1.3.1, and replication 
> monitoring essentially breaks without this change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17718) Difference between RS's servername and its ephemeral node cause SSH stop working

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893574#comment-15893574
 ] 

stack commented on HBASE-17718:
---

We should not be using the local name. We need to use the name the master tells 
us to use. Writing zk with local name is an error. Even if it there for a blink 
of an eye, the master might notice and get confused thinking it a legit server.

> Difference between RS's servername and its ephemeral node cause SSH stop 
> working
> 
>
> Key: HBASE-17718
> URL: https://issues.apache.org/jira/browse/HBASE-17718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.2.4, 1.1.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>
> After HBASE-9593, RS put up an ephemeral node in ZK before reporting for 
> duty. But if the hosts config (/etc/hosts) is different between master and 
> RS, RS's serverName can be different from the one stored the ephemeral zk 
> node. The email metioned in HBASE-13753 
> (http://mail-archives.apache.org/mod_mbox/hbase-user/201505.mbox/%3CCANZDn9ueFEEuZMx=pZdmtLsdGLyZz=rrm1N6EQvLswYc1z-H=g...@mail.gmail.com%3E)
>  is exactly what happened in our production env. 
> But what the email didn't point out is that the difference between serverName 
> in RS and zk node can cause SSH stop to work. as we can see from the code in 
> {{RegionServerTracker}}
> {code}
>   @Override
>   public void nodeDeleted(String path) {
> if (path.startsWith(watcher.rsZNode)) {
>   String serverName = ZKUtil.getNodeName(path);
>   LOG.info("RegionServer ephemeral node deleted, processing expiration [" 
> +
> serverName + "]");
>   ServerName sn = ServerName.parseServerName(serverName);
>   if (!serverManager.isServerOnline(sn)) {
> LOG.warn(serverName.toString() + " is not online or isn't known to 
> the master."+
>  "The latter could be caused by a DNS misconfiguration.");
> return;
>   }
>   remove(sn);
>   this.serverManager.expireServer(sn);
> }
>   }
> {code}
> The server will not be processed by SSH/ServerCrashProcedure. The regions on 
> this server will not been assigned again until master restart or failover.
> I know HBASE-9593 was to fix the issue if RS report to duty and crashed 
> before it can put up a zk node. It is a very rare case(And controllable, just 
> fix the bug making rs to crash). But The issue I metioned can happened more 
> often(and uncontrollable, can't be fixed in HBase, due to DNS, hosts config, 
> etc.) and have more severe consequence.
> So here I offer some solutions to discuss:
> 1. Revert HBASE-9593 from all branches, Andrew Purtell has reverted it in 
> branch-0.98
> 2. Abort RS if master return a different name, otherwise SSH can't work 
> properly
> 3. Master accepts whatever servername reported by RS and don't change it.
> 4.correct the zk node if master return another name( idea from Ted Yu)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging

2017-03-02 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-17722:
--

 Summary: Metrics subsystem stop/start messages add a lot of 
useless bulk to operational logging
 Key: HBASE-17722
 URL: https://issues.apache.org/jira/browse/HBASE-17722
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 1.2.4, 1.3.0
Reporter: Andrew Purtell


Metrics subsystem stop/start messages add a lot of useless bulk to operational 
logging. Say you are collecting logs from a fleet of thousands of servers and 
want to have them around for ~month or longer. It adds up. 

I think these should at least be at DEBUG level and ideally at TRACE. They 
don't offer much utility.

{noformat}
 INFO  [] impl.MetricsSystemImpl: HBase metrics system started

 INFO  [] impl.MetricsSystemImpl: Stopping HBase metrics system...

 INFO  [] impl.MetricsSystemImpl: HBase metrics system stopped.

 INFO  [] impl.MetricsConfig: loaded properties from 
hadoop-metrics2-hbase.properties

 INFO  [] impl.MetricsSystemImpl: Scheduled snapshot period at 10 
second(s).
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15314) Allow more than one backing file in bucketcache

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893558#comment-15893558
 ] 

stack commented on HBASE-15314:
---

I started to look at rb but looking at the quality of the comments by 
[~ramkrishna.s.vasude...@gmail.com], [~aartokhy] and [~zyork], I'd only be 
getting in the way.

> Allow more than one backing file in bucketcache
> ---
>
> Key: HBASE-15314
> URL: https://issues.apache.org/jira/browse/HBASE-15314
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: stack
>Assignee: Aaron Tokhy
> Attachments: FileIOEngine.java, HBASE-15314.master.001.patch, 
> HBASE-15314.master.001.patch, HBASE-15314.patch, HBASE-15314-v2.patch, 
> HBASE-15314-v3.patch, HBASE-15314-v4.patch, HBASE-15314-v5.patch
>
>
> Allow bucketcache use more than just one backing file: e.g. chassis has more 
> than one SSD in it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17579) Backport HBASE-16302 to 1.3.1

2017-03-02 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893559#comment-15893559
 ] 

Gary Helmling commented on HBASE-17579:
---

I tested this locally again and confirmed that patch v2 does retain the 
existing "source.ageOfLastShippedOp" and "sink.ageOfLastAppliedOp" metric names 
for backwards compatibility.

One comment on the patch: there's a race in CompatibilityRegistry.getGauge().  
The method should return the value from gaugesMap.putIfAbsent() if it is 
non-null, instead of gauge.  Otherwise the gauge reference it returns will not 
be referenced in the map.

Otherwise the patch looks good to me.

> Backport HBASE-16302 to 1.3.1
> -
>
> Key: HBASE-17579
> URL: https://issues.apache.org/jira/browse/HBASE-17579
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Fix For: 1.3.1
>
> Attachments: HBASE-17579.branch-1.3.001.patch, 
> HBASE-17579.branch-1.3.002.patch
>
>
> This is a simple enough change to be included in 1.3.1, and replication 
> monitoring essentially breaks without this change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17516) Table quota not taking precedence over namespace quota

2017-03-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893548#comment-15893548
 ] 

Ted Yu commented on HBASE-17516:


https://builds.apache.org/job/PreCommit-HBASE-Build/5889/testReport/org.apache.hadoop.hbase.quotas/
 doesn't show modified test(s), such as TestSpaceQuotas

Mind running quotas tests and posting back here ?

> Table quota not taking precedence over namespace quota
> --
>
> Key: HBASE-17516
> URL: https://issues.apache.org/jira/browse/HBASE-17516
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Romil Choksi
>Assignee: Josh Elser
> Fix For: HBASE-16961
>
> Attachments: HBASE-17516.001.patch, HBASE-17516.002.HBASE-16961.patch
>
>
> [~romil.choksi] found a bug in the current patch-set where a more restrictive 
> table quota did not take priority over a less-restrictive namespace quota.
> Turns out some of the logic to handle this case was incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14375) define public API for spark integration module

2017-03-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893544#comment-15893544
 ] 

Jerry He commented on HBASE-14375:
--

Regarding the API docs, is it our goal that we convert the scala-doc to 
java-doc and show them together with the rest of HBase Java API docs?
Hopefully the annotations can be effectively filtered as they are now.  
Otherwise we have to find other ways to do it. For example, going into the 
scala classes to update the class/function modifiers?  Hopefully that can be 
separated into another task.

> define public API for spark integration module
> --
>
> Key: HBASE-14375
> URL: https://issues.apache.org/jira/browse/HBASE-14375
> Project: HBase
>  Issue Type: Task
>  Components: spark
>Reporter: Sean Busbey
>Assignee: Jerry He
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14375-v1.patch
>
>
> before we can put the spark integration module into a release, we need to 
> annotate its public api surface.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17516) Table quota not taking precedence over namespace quota

2017-03-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17516:
---
Reporter: Romil Choksi  (was: Josh Elser)

> Table quota not taking precedence over namespace quota
> --
>
> Key: HBASE-17516
> URL: https://issues.apache.org/jira/browse/HBASE-17516
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Romil Choksi
>Assignee: Josh Elser
> Fix For: HBASE-16961
>
> Attachments: HBASE-17516.001.patch, HBASE-17516.002.HBASE-16961.patch
>
>
> [~romil.choksi] found a bug in the current patch-set where a more restrictive 
> table quota did not take priority over a less-restrictive namespace quota.
> Turns out some of the logic to handle this case was incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17680) Run mini cluster through JNI in tests

2017-03-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17680:
---
Attachment: 17680.v26.txt

Patch v26 addresses Enis' latest comments.

WriteConf() is kept for core:client-test

> Run mini cluster through JNI in tests
> -
>
> Key: HBASE-17680
> URL: https://issues.apache.org/jira/browse/HBASE-17680
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 17680.v14.txt, 17680.v17.txt, 17680.v18.txt, 
> 17680.v1.txt, 17680.v20.txt, 17680.v22.txt, 17680.v23.txt, 17680.v26.txt, 
> 17680.v3.txt, 17680.v8.txt
>
>
> Currently tests start local hbase cluster through hbase shell.
> There is less control over the configuration of the local cluster this way.
> This issue would replace hbase shell with JNI interface to mini cluster.
> We would have full control over the cluster behavior.
> Thanks to [~devaraj] who started this initiative.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14375) define public API for spark integration module

2017-03-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893539#comment-15893539
 ] 

Jerry He commented on HBASE-14375:
--

I can:
-- Use the DataSourceRegister trait.  Define the format name in the public 
HBaseSparkConf.  Still mark the data source implementation class private.
Or
-- Mark it LimitedPrivate with HBaseInterfaceAudience.SPARK

> define public API for spark integration module
> --
>
> Key: HBASE-14375
> URL: https://issues.apache.org/jira/browse/HBASE-14375
> Project: HBase
>  Issue Type: Task
>  Components: spark
>Reporter: Sean Busbey
>Assignee: Jerry He
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14375-v1.patch
>
>
> before we can put the spark integration module into a release, we need to 
> annotate its public api surface.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-14375) define public API for spark integration module

2017-03-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891531#comment-15891531
 ] 

Jerry He edited comment on HBASE-14375 at 3/3/17 1:25 AM:
--

Do you think we can just mark the DefaultSource private as it is?


was (Author: jinghe):
You don't we can make the DefaultSource private as it is?

> define public API for spark integration module
> --
>
> Key: HBASE-14375
> URL: https://issues.apache.org/jira/browse/HBASE-14375
> Project: HBase
>  Issue Type: Task
>  Components: spark
>Reporter: Sean Busbey
>Assignee: Jerry He
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14375-v1.patch
>
>
> before we can put the spark integration module into a release, we need to 
> annotate its public api surface.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17718) Difference between RS's servername and its ephemeral node cause SSH stop working

2017-03-02 Thread Allan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-17718:
---
Description: 
After HBASE-9593, RS put up an ephemeral node in ZK before reporting for duty. 
But if the hosts config (/etc/hosts) is different between master and RS, RS's 
serverName can be different from the one stored the ephemeral zk node. The 
email metioned in HBASE-13753 
(http://mail-archives.apache.org/mod_mbox/hbase-user/201505.mbox/%3CCANZDn9ueFEEuZMx=pZdmtLsdGLyZz=rrm1N6EQvLswYc1z-H=g...@mail.gmail.com%3E)
 is exactly what happened in our production env. 

But what the email didn't point out is that the difference between serverName 
in RS and zk node can cause SSH stop to work. as we can see from the code in 
{{RegionServerTracker}}
{code}
  @Override
  public void nodeDeleted(String path) {
if (path.startsWith(watcher.rsZNode)) {
  String serverName = ZKUtil.getNodeName(path);
  LOG.info("RegionServer ephemeral node deleted, processing expiration [" +
serverName + "]");
  ServerName sn = ServerName.parseServerName(serverName);
  if (!serverManager.isServerOnline(sn)) {
LOG.warn(serverName.toString() + " is not online or isn't known to the 
master."+
 "The latter could be caused by a DNS misconfiguration.");
return;
  }
  remove(sn);
  this.serverManager.expireServer(sn);
}
  }
{code}
The server will not be processed by SSH/ServerCrashProcedure. The regions on 
this server will not been assigned again until master restart or failover.
I know HBASE-9593 was to fix the issue if RS report to duty and crashed before 
it can put up a zk node. It is a very rare case(And controllable, just fix the 
bug making rs to crash). But The issue I metioned can happened more often(and 
uncontrollable, can't be fixed in HBase, due to DNS, hosts config, etc.) and 
have more severe consequence.

So here I offer some solutions to discuss:
1. Revert HBASE-9593 from all branches, Andrew Purtell has reverted it in 
branch-0.98
2. Abort RS if master return a different name, otherwise SSH can't work properly
3. Master accepts whatever servername reported by RS and don't change it.
4.correct the zk node if master return another name( idea from Ted Yu)
 

  was:
After HBASE-9593, RS put up an ephemeral node in ZK before reporting for duty. 
But if the hosts config (/etc/hosts) is different between master and RS, RS's 
serverName can be different from the one stored the ephemeral zk node. The 
email metioned in HBASE-13753 
(http://mail-archives.apache.org/mod_mbox/hbase-user/201505.mbox/%3CCANZDn9ueFEEuZMx=pZdmtLsdGLyZz=rrm1N6EQvLswYc1z-H=g...@mail.gmail.com%3E)
 is exactly what happened in our production env. 

But what the email didn't point out is that the difference between serverName 
in RS and zk node can cause SSH stop to work. as we can see from the code in 
{{RegionServerTracker}}
{code}
  @Override
  public void nodeDeleted(String path) {
if (path.startsWith(watcher.rsZNode)) {
  String serverName = ZKUtil.getNodeName(path);
  LOG.info("RegionServer ephemeral node deleted, processing expiration [" +
serverName + "]");
  ServerName sn = ServerName.parseServerName(serverName);
  if (!serverManager.isServerOnline(sn)) {
LOG.warn(serverName.toString() + " is not online or isn't known to the 
master."+
 "The latter could be caused by a DNS misconfiguration.");
return;
  }
  remove(sn);
  this.serverManager.expireServer(sn);
}
  }
{code}
The server will not be processed by SSH/ServerCrashProcedure. The regions on 
this server will not been assigned again until master restart or failover.
I know HBASE-9593 was to fix the issue if RS report to duty and crashed before 
it can put up a zk node. It is a very rare case(And controllable, just fix the 
bug making rs to crash). But The issue I metioned can happened more often(and 
uncontrollable, can't be fixed in HBase, due to DNS, hosts config, etc.) and 
have more severe consequence.

So here I offer some solutions to discuss:
1. Revert HBASE-9593 from all branches, Andrew Purtell has reverted it in 
branch-0.98
2. Abort RS if master return a different name, otherwise SSH can't work properly
3. Master accepts whatever servername reported by RS and don't change it.

 


> Difference between RS's servername and its ephemeral node cause SSH stop 
> working
> 
>
> Key: HBASE-17718
> URL: https://issues.apache.org/jira/browse/HBASE-17718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.2.4, 1.1.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>
> After HBASE-9593, RS put up an ephemeral node in ZK before reporting for 
> duty. But if the hosts config (/etc/hosts) is different 

[jira] [Commented] (HBASE-17718) Difference between RS's servername and its ephemeral node cause SSH stop working

2017-03-02 Thread Allan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893526#comment-15893526
 ] 

Allan Yang commented on HBASE-17718:


{quote}
Another option is for region server to correct the server name in znode.
This can be done via zookeeper multi: drop previous znode and create new znode 
in one call.
{quote}
It is also a good choice, but I afraid delete the old one will trigger a server 
shutdown handling event, and the old servername will show as dead server in 
master's web until restart.
But I will put this one as 4th solution. Let's vote on those solutions, so I 
can prepare a patch.

> Difference between RS's servername and its ephemeral node cause SSH stop 
> working
> 
>
> Key: HBASE-17718
> URL: https://issues.apache.org/jira/browse/HBASE-17718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.2.4, 1.1.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>
> After HBASE-9593, RS put up an ephemeral node in ZK before reporting for 
> duty. But if the hosts config (/etc/hosts) is different between master and 
> RS, RS's serverName can be different from the one stored the ephemeral zk 
> node. The email metioned in HBASE-13753 
> (http://mail-archives.apache.org/mod_mbox/hbase-user/201505.mbox/%3CCANZDn9ueFEEuZMx=pZdmtLsdGLyZz=rrm1N6EQvLswYc1z-H=g...@mail.gmail.com%3E)
>  is exactly what happened in our production env. 
> But what the email didn't point out is that the difference between serverName 
> in RS and zk node can cause SSH stop to work. as we can see from the code in 
> {{RegionServerTracker}}
> {code}
>   @Override
>   public void nodeDeleted(String path) {
> if (path.startsWith(watcher.rsZNode)) {
>   String serverName = ZKUtil.getNodeName(path);
>   LOG.info("RegionServer ephemeral node deleted, processing expiration [" 
> +
> serverName + "]");
>   ServerName sn = ServerName.parseServerName(serverName);
>   if (!serverManager.isServerOnline(sn)) {
> LOG.warn(serverName.toString() + " is not online or isn't known to 
> the master."+
>  "The latter could be caused by a DNS misconfiguration.");
> return;
>   }
>   remove(sn);
>   this.serverManager.expireServer(sn);
> }
>   }
> {code}
> The server will not be processed by SSH/ServerCrashProcedure. The regions on 
> this server will not been assigned again until master restart or failover.
> I know HBASE-9593 was to fix the issue if RS report to duty and crashed 
> before it can put up a zk node. It is a very rare case(And controllable, just 
> fix the bug making rs to crash). But The issue I metioned can happened more 
> often(and uncontrollable, can't be fixed in HBase, due to DNS, hosts config, 
> etc.) and have more severe consequence.
> So here I offer some solutions to discuss:
> 1. Revert HBASE-9593 from all branches, Andrew Purtell has reverted it in 
> branch-0.98
> 2. Abort RS if master return a different name, otherwise SSH can't work 
> properly
> 3. Master accepts whatever servername reported by RS and don't change it.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17680) Run mini cluster through JNI in tests

2017-03-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893389#comment-15893389
 ] 

Ted Yu commented on HBASE-17680:


bq. This behavior coming from the existing code

How about addressing the last comment in another issue (since it is 
pre-existing behavior) ?

> Run mini cluster through JNI in tests
> -
>
> Key: HBASE-17680
> URL: https://issues.apache.org/jira/browse/HBASE-17680
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 17680.v14.txt, 17680.v17.txt, 17680.v18.txt, 
> 17680.v1.txt, 17680.v20.txt, 17680.v22.txt, 17680.v23.txt, 17680.v3.txt, 
> 17680.v8.txt
>
>
> Currently tests start local hbase cluster through hbase shell.
> There is less control over the configuration of the local cluster this way.
> This issue would replace hbase shell with JNI interface to mini cluster.
> We would have full control over the cluster behavior.
> Thanks to [~devaraj] who started this initiative.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17382) Give RegionLocateType a better name

2017-03-02 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-17382:
--
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Resolving as won't fix. Not enough impetus/agreement. We have bigger fish to 
fry. Thanks [~Jan Hentschel]

> Give RegionLocateType a better name
> ---
>
> Key: HBASE-17382
> URL: https://issues.apache.org/jira/browse/HBASE-17382
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>Assignee: Jan Hentschel
>Priority: Trivial
> Attachments: HBASE-17382.master.001.patch
>
>
> Pointed out by [~tedyu] that 'Locate' is a verb and usually we need a noun 
> here. 'Locating' or 'Location'?
> Suggestion are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17715) expose a sane API to package a standalone client jar

2017-03-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893364#comment-15893364
 ] 

Sergey Shelukhin commented on HBASE-17715:
--

Pre-building it is another alternative... that's what Hive does for MR/etc, and 
for JDBC. As long as it actually works, i.e. intended for the purpose, and not 
"in theory" ;)

> expose a sane API to package a standalone client jar
> 
>
> Key: HBASE-17715
> URL: https://issues.apache.org/jira/browse/HBASE-17715
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Enis Soztutar
>
> TableMapReduceUtil currently exposes a method that takes some info from job 
> object iirc, and then makes a standalone jar and adds it to classpath.
> It would be nice to have an API that one can call with minimum necessary 
> arguments (not dependent on job stuff, "tmpjars" and all that) that would 
> make a standalone client jar at a given path and let the caller manage it 
> after that.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17680) Run mini cluster through JNI in tests

2017-03-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893355#comment-15893355
 ] 

Ted Yu commented on HBASE-17680:


w.r.t. lint warnings, there are a lot of existing ones.
e.g.
{code}
serde/region-info.h:24:  Found C system header after other header. Should be: 
region-info.h, c system, c++ system, other.  [build/include_order] [4]
{code}
I am trying to get rid of ones in new files.


> Run mini cluster through JNI in tests
> -
>
> Key: HBASE-17680
> URL: https://issues.apache.org/jira/browse/HBASE-17680
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 17680.v14.txt, 17680.v17.txt, 17680.v18.txt, 
> 17680.v1.txt, 17680.v20.txt, 17680.v22.txt, 17680.v23.txt, 17680.v3.txt, 
> 17680.v8.txt
>
>
> Currently tests start local hbase cluster through hbase shell.
> There is less control over the configuration of the local cluster this way.
> This issue would replace hbase shell with JNI interface to mini cluster.
> We would have full control over the cluster behavior.
> Thanks to [~devaraj] who started this initiative.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893348#comment-15893348
 ] 

Hudson commented on HBASE-17707:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2601 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2601/])
HBASE-17707 New More Accurate Table Skew cost function/generator (Kahlil 
(tedyu: rev 06e984b08689c1ee47f2c94d423357f81d935af1)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancer.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancer2.java


> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893349#comment-15893349
 ] 

Hudson commented on HBASE-17717:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2601 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2601/])
HBASE-17717 Explicitly use "sasl" ACL scheme for hbase superuser (elserj: rev 
5645684ec9e77d9af6c75b3378468a167cdc9788)
* (edit) 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKUtil.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java


> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.0.98.patch, 
> HBASE-17717.001.branch-1.1.patch, HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893341#comment-15893341
 ] 

Hadoop QA commented on HBASE-17717:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 41s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
32s {color} | {color:green} 0.98 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} 0.98 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
40s {color} | {color:green} 0.98 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} 0.98 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 51s 
{color} | {color:red} hbase-client in 0.98 has 16 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} 0.98 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 
31s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 15s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
11s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 49s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:ef91163 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12855725/HBASE-17717.001.0.98.patch
 |
| JIRA Issue | HBASE-17717 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 2da96f055c22 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/hbase.sh |
| git revision | 0.98 / f30a654 |
| Default Java | 1.7.0_80 |
| findbugs | v2.0.1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5923/artifact/patchprocess/branch-findbugs-hbase-client-warnings.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5923/testReport/ |
| modules | C: hbase-client U: hbase-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5923/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 

[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893260#comment-15893260
 ] 

Andrew Purtell commented on HBASE-17717:


Thanks [~elserj]. Your brain was in the right place. Feel free to commit the 
0.98 patch, but it's unlikely we will have another 0.98 release. I mean, anyone 
can do it, but I have no plans to make another one.

> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.0.98.patch, 
> HBASE-17717.001.branch-1.1.patch, HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17717:
---
Attachment: HBASE-17717.001.0.98.patch

> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.0.98.patch, 
> HBASE-17717.001.branch-1.1.patch, HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893242#comment-15893242
 ] 

Josh Elser commented on HBASE-17717:


Oh duh, where is my brain. Let me get something up for 0.98 too :).

Thanks for looking, Andrew!

> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.branch-1.1.patch, HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17516) Table quota not taking precedence over namespace quota

2017-03-02 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893237#comment-15893237
 ] 

Josh Elser commented on HBASE-17516:


[~tedyu], when you have a moment to review :)

> Table quota not taking precedence over namespace quota
> --
>
> Key: HBASE-17516
> URL: https://issues.apache.org/jira/browse/HBASE-17516
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: HBASE-16961
>
> Attachments: HBASE-17516.001.patch, HBASE-17516.002.HBASE-16961.patch
>
>
> [~romil.choksi] found a bug in the current patch-set where a more restrictive 
> table quota did not take priority over a less-restrictive namespace quota.
> Turns out some of the logic to handle this case was incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17680) Run mini cluster through JNI in tests

2017-03-02 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893228#comment-15893228
 ] 

Enis Soztutar commented on HBASE-17680:
---

I've checked the v23 patch. You did not address these previous comments: 
- There is still a very large number of these kind of warnings: 
{code}
core/filter-test.cc:65:39: warning: ISO C++11 does not allow conversion from 
string literal to 'char *' [-Wwritable-strings]
  FilterTest::test_util_->CreateTable("t", "d");
{code}
You should use std::string everywhere, except when you are sending it to the 
JVM object. So, the method signatures like: 
{code}
jobject MiniCluster::CreateTable(char *tblNam, char *familyName) {
{code} 
should be always taking std::string, or better {{const std::string&}}. We want 
to minimize the {{char *}} usage to be only happening in the JVM layer. You can 
use string::c_str(). 
- Also this warning: 
{code}
test-util/mini-cluster.cc:260:1: warning: control reaches end of non-void 
function [-Wreturn-type]
{code}
- Some of the fields in MiniCluster does not end with {{_}}. Examples are 
{{cluster}}, {{jvm}}, etc. 
- As per above, you should not need WriteConf() method at all. Please construct 
a Configuration object and populate it via calling {{Configuration::Set()}} 
from the java-level Configuration object. You seem to be doing that in 
TestUtil, so please remove the WriteConf and setting the path, etc. Also return 
std::string in GetConf(), etc. 
- We use {{make lint}} to run the cpplint script. Can you please address these 
new warnings as well: 
{code}
test-util/test-util.cc:65:  Add #include  for make_unique<>  
[build/include_what_you_use] [4]
test-util/test-util.cc:77:  Could not find a newline character at the end of 
the file.  [whitespace/ending_newline] [5]
Done processing test-util/test-util.cc
test-util/test-util.h:84:  Add #include  for shared_ptr<>  
[build/include_what_you_use] [4]
Done processing test-util/test-util.h
test-util/mini-cluster.cc:22:  "glog/logging.h" already included at 
test-util/mini-cluster.cc:21  [build/include] [4]
test-util/mini-cluster.cc:25:  Found C system header after C++ system header. 
Should be: mini-cluster.h, c system, c++ system, other.  [build/include_order] 
[4]
test-util/mini-cluster.cc:26:  Found C system header after C++ system header. 
Should be: mini-cluster.h, c system, c++ system, other.  [build/include_order] 
[4]
test-util/mini-cluster.cc:70:  Missing space before {  [whitespace/braces] [5]
test-util/mini-cluster.cc:75:  Using C-style cast.  Use reinterpret_cast(...) instead  [readability/casting] [4]
test-util/mini-cluster.cc:227:  Using C-style cast.  Use reinterpret_cast(...) instead  [readability/casting] [4]
test-util/mini-cluster.cc:313:  Add #include  for string  
[build/include_what_you_use] [4]
Done processing test-util/mini-cluster.cc
test-util/mini-cluster.h:28:  Include the directory when naming .h files  
[build/include] [4]
test-util/mini-cluster.h:35:  Add #include  for string  
[build/include_what_you_use] [4]
{code}
- GetConf() is overloaded. For the one returning a configuration value, you 
should name it GetConfValue(). 
- Should the jobject cluster be a field in MiniCluster. No need to pass it to 
methods like GetConf() I think. In StartCluster(), you can save it in the 
field. 
- This behavior coming from the existing code: 
{code}
Creating a TestUtil will spin up a cluster with numRegionServers region servers.
{code}
But I think it is wrong. Can we change it so that TestUtil ctor does not start 
the cluster, but you have to manually call StartCluster. I think it is better 
this way, because in theory you should be able use the test util without the 
cluster. 
Other then these, the patch looks good. 


> Run mini cluster through JNI in tests
> -
>
> Key: HBASE-17680
> URL: https://issues.apache.org/jira/browse/HBASE-17680
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 17680.v14.txt, 17680.v17.txt, 17680.v18.txt, 
> 17680.v1.txt, 17680.v20.txt, 17680.v22.txt, 17680.v23.txt, 17680.v3.txt, 
> 17680.v8.txt
>
>
> Currently tests start local hbase cluster through hbase shell.
> There is less control over the configuration of the local cluster this way.
> This issue would replace hbase shell with JNI interface to mini cluster.
> We would have full control over the cluster behavior.
> Thanks to [~devaraj] who started this initiative.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17721) Provide streaming APIs with SSL/TLS

2017-03-02 Thread Alex Araujo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893192#comment-15893192
 ] 

Alex Araujo commented on HBASE-17721:
-

gRPC has come along way since HBASE-13467 was last touched, and 2.x seems to 
have the right dependency versions now besides Guava. Perhaps a good starting 
point would be to provide a new patch there.
Good call on avoiding an additional port if possible.

> Provide streaming APIs with SSL/TLS
> ---
>
> Key: HBASE-17721
> URL: https://issues.apache.org/jira/browse/HBASE-17721
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Alex Araujo
>Assignee: Alex Araujo
> Fix For: 2.0.0
>
>
> Umbrella to add optional client/server streaming capabilities to HBase.
> This would allow bandwidth to be used more efficiently for certain 
> operations, and allow clients to use SSL/TLS for authentication and 
> encryption.
> Desired client/server scaffolding:
> - HTTP/2 support
> - Protocol negotiation (blocking vs streaming, auth, encryption, etc.)
> - TLS/SSL support
> - Streaming RPC support
> Possibilities (and their tradeoffs):
> - gRPC: Some initial work and discussion on HBASE-13467 (Prototype using GRPC 
> as IPC mechanism)
> -- Has most or all of the desired scaffolding
> -- Adds additional g* dependencies. Compat story for g* dependencies not 
> always ideal
> - Custom HTTP/2 based client/server APIs
> -- More control over compat story
> -- Non-trivial to build scaffolding; might reinvent wheels along the way
> - Others?
> Related Jiras that might be rolled in as sub-tasks (or closed/replaced with 
> new ones):
> HBASE-17708 (Expose config to set two-way auth over TLS in HttpServer and add 
> a test)
> HBASE-8691 (High-Throughput Streaming Scan API)
> HBASE-14899 (Create custom Streaming ReplicationEndpoint)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17338) Treat Cell data size under global memstore heap size only when that Cell can not be copied to MSLAB

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893152#comment-15893152
 ] 

stack commented on HBASE-17338:
---

bq. Not sure whether used dataSize/cellSize here and there. Its data size only 
which is the way as u said above. There is no cellSize as such. There is only 
cell heap size which is the heap size (total) occupied by the cell impl object

Thanks. So, datasize vs heapsize, have we written these up? datasize is 
serialized size... whether serialized on rpc or in blockcache and heapsize is 
how large a 'live' Cell in java heap is? If the Cell data is offheap, it is the 
the onheap Cell proxy object?

bq. Ya there is one global threshold in both cases. Moreover in case of off 
heap, there is a heap global threshold also. ie. By def 40% of xmx above which 
we will force flush. In case of offheap, we have to check this extra thing also 
or else there is possibility of global memstore size getting oversized and GC 
impacts/OOME.

How does the global threshold work in the case where we are doing offheap 
accounting too.  The global check will look at the onheap limit and the offheap 
limit and if we hit the offheap limit before we hit the onheap limit, we'll 
flush?

bq. Say in RS MSLAB is ON. But the regions from this particular table wont use 
MSLAB at all.

Should be easy enough to do?

Thanks [~anoop.hbase]


> Treat Cell data size under global memstore heap size only when that Cell can 
> not be copied to MSLAB
> ---
>
> Key: HBASE-17338
> URL: https://issues.apache.org/jira/browse/HBASE-17338
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-17338.patch, HBASE-17338_V2.patch, 
> HBASE-17338_V2.patch, HBASE-17338_V4.patch, HBASE-17338_V5.patch
>
>
> We have only data size and heap overhead being tracked globally.  Off heap 
> memstore works with off heap backed MSLAB pool.  But a cell, when added to 
> memstore, not always getting copied to MSLAB.  Append/Increment ops doing an 
> upsert, dont use MSLAB.  Also based on the Cell size, we sometimes avoid 
> MSLAB copy.  But now we track these cell data size also under the global 
> memstore data size which indicated off heap size in case of off heap 
> memstore.  For global checks for flushes (against lower/upper watermark 
> levels), we check this size against max off heap memstore size.  We do check 
> heap overhead against global heap memstore size (Defaults to 40% of xmx)  But 
> for such cells the data size also should be accounted under the heap overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893147#comment-15893147
 ] 

Andrew Purtell commented on HBASE-17717:


+1, thanks for finding this

> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.branch-1.1.patch, HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17721) Provide streaming APIs with SSL/TLS

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893129#comment-15893129
 ] 

stack commented on HBASE-17721:
---

grpc or go home I'd say (unless perf is abysmal -- haven't tried). Would be 
sweet if our web port could be taught to do grpc so we didn't have to put up 
another port (https://groups.google.com/forum/#!topic/grpc-io/JnjCYGPMUms 
https://github.com/grpc/grpc/issues/8043). This one is related too I'd say: 
HBASE-8691 (love this issue).

> Provide streaming APIs with SSL/TLS
> ---
>
> Key: HBASE-17721
> URL: https://issues.apache.org/jira/browse/HBASE-17721
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Alex Araujo
>Assignee: Alex Araujo
> Fix For: 2.0.0
>
>
> Umbrella to add optional client/server streaming capabilities to HBase.
> This would allow bandwidth to be used more efficiently for certain 
> operations, and allow clients to use SSL/TLS for authentication and 
> encryption.
> Desired client/server scaffolding:
> - HTTP/2 support
> - Protocol negotiation (blocking vs streaming, auth, encryption, etc.)
> - TLS/SSL support
> - Streaming RPC support
> Possibilities (and their tradeoffs):
> - gRPC: Some initial work and discussion on HBASE-13467 (Prototype using GRPC 
> as IPC mechanism)
> -- Has most or all of the desired scaffolding
> -- Adds additional g* dependencies. Compat story for g* dependencies not 
> always ideal
> - Custom HTTP/2 based client/server APIs
> -- More control over compat story
> -- Non-trivial to build scaffolding; might reinvent wheels along the way
> - Others?
> Related Jiras that might be rolled in as sub-tasks (or closed/replaced with 
> new ones):
> HBASE-17708 (Expose config to set two-way auth over TLS in HttpServer and add 
> a test)
> HBASE-8691 (High-Throughput Streaming Scan API)
> HBASE-14899 (Create custom Streaming ReplicationEndpoint)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893121#comment-15893121
 ] 

Josh Elser commented on HBASE-17717:


Branch-1.1 seems to be -1 due to extant javadoc issues... Will address later.

> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.branch-1.1.patch, HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17710) HBase in standalone mode creates directories with 777 permission

2017-03-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17710:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the reviews.

> HBase in standalone mode creates directories with 777 permission
> 
>
> Key: HBASE-17710
> URL: https://issues.apache.org/jira/browse/HBASE-17710
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.1.2
> Environment: HDP-2.5.3
>Reporter: Toshihiro Suzuki
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17710.branch-1.v1.txt, 17710.branch-1.v2.txt, 
> 17710.branch-1.v2.txt, 17710.branch-1.v3.txt, 17710.v1.txt, 17710.v2.txt, 
> 17710.v3.txt, 17710.v4.txt, 17710.v5.txt
>
>
> HBase in standalone mode creates directories with 777 permission in 
> hbase.rootdir.
> Ambari metrics collector defaults to standalone mode.
> {code}
> # find /var/lib/ambari-metrics-collector/hbase -perm 777 -type d -exec ls -ld 
> {} \;
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/hbase/namespace/d0cca53847904f4b4add1caa0ce3a9af/info
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/hbase/backup/cbceb8fccd968b4b4583365d4dc6e377/meta
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/hbase/backup/cbceb8fccd968b4b4583365d4dc6e377/session
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.CATALOG/2f4ce2294cd21cecb58fd1aca5646144/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.SEQUENCE/0eb67274ece8a4a26cfeeef2c6d4cd37/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.SEQUENCE/aef86710a4005f98e2dc90675f2eb325/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.STATS/5b1d955e255e55979621214a7e4083b8/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.FUNCTION/32c033735cf144bac5637de23f7f7dd0/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRICS_METADATA/e420dfa799742fe4516ad1e4deefb793/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/HOSTED_APPS_METADATA/110be63e2a9994121fc5b48d663daf2c/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/CONTAINER_METRICS/a103719f87e8430635abf51a7fe98637/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD/cdb1d032beb90e350ce309e5d383c78e/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD/294deab47187494e845a5199702b4d04/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_MINUTE/1a263b4fe068ef2db5ba1c3e45553354/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_MINUTE/48f94dfb0161d8a28f645d2e1a473235/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_HOURLY/6d096ac3e70e54dd4a8612e17cfc4b11/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_DAILY/e81850d62da64c8d1c67be309f136e23/0
> drwxrwxrwx. 2 ams hadoop 45 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/b43ff796de887197834ad62fdb612b59/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/b43ff796de887197834ad62fdb612b59/.tmp
> drwxrwxrwx. 2 ams hadoop 45 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/c8eadeb7dead8fda9729b8e9b10c4929/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/c8eadeb7dead8fda9729b8e9b10c4929/.tmp
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE_MINUTE/ca9f9754ae9ae4cdc3e1b0523eecc390/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:18 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE_HOURLY/8412e8a8aec5d6307943fac78ce14c7a/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:18 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE_DAILY/7c3358aba91ea0d76ddd8bc3ceb2d578/0
> {code}
> My analysis is as follows:
> FileSystem.mkdirs(Path f) method creates a directory with permission 777. 
> Because HFileSystem which inherits FileSystem 

[jira] [Commented] (HBASE-17710) HBase in standalone mode creates directories with 777 permission

2017-03-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893106#comment-15893106
 ] 

Hudson commented on HBASE-17710:


FAILURE: Integrated in Jenkins build HBase-1.4 #654 (See 
[https://builds.apache.org/job/HBase-1.4/654/])
HBASE-17710 HBase in standalone mode creates directories with 777 (tedyu: rev 
88f909cf1f9300c2e2b5d99b7300d74c0b0d7916)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java


> HBase in standalone mode creates directories with 777 permission
> 
>
> Key: HBASE-17710
> URL: https://issues.apache.org/jira/browse/HBASE-17710
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.1.2
> Environment: HDP-2.5.3
>Reporter: Toshihiro Suzuki
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17710.branch-1.v1.txt, 17710.branch-1.v2.txt, 
> 17710.branch-1.v2.txt, 17710.branch-1.v3.txt, 17710.v1.txt, 17710.v2.txt, 
> 17710.v3.txt, 17710.v4.txt, 17710.v5.txt
>
>
> HBase in standalone mode creates directories with 777 permission in 
> hbase.rootdir.
> Ambari metrics collector defaults to standalone mode.
> {code}
> # find /var/lib/ambari-metrics-collector/hbase -perm 777 -type d -exec ls -ld 
> {} \;
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/hbase/namespace/d0cca53847904f4b4add1caa0ce3a9af/info
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/hbase/backup/cbceb8fccd968b4b4583365d4dc6e377/meta
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/hbase/backup/cbceb8fccd968b4b4583365d4dc6e377/session
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.CATALOG/2f4ce2294cd21cecb58fd1aca5646144/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.SEQUENCE/0eb67274ece8a4a26cfeeef2c6d4cd37/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.SEQUENCE/aef86710a4005f98e2dc90675f2eb325/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.STATS/5b1d955e255e55979621214a7e4083b8/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.FUNCTION/32c033735cf144bac5637de23f7f7dd0/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRICS_METADATA/e420dfa799742fe4516ad1e4deefb793/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/HOSTED_APPS_METADATA/110be63e2a9994121fc5b48d663daf2c/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/CONTAINER_METRICS/a103719f87e8430635abf51a7fe98637/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD/cdb1d032beb90e350ce309e5d383c78e/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD/294deab47187494e845a5199702b4d04/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_MINUTE/1a263b4fe068ef2db5ba1c3e45553354/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_MINUTE/48f94dfb0161d8a28f645d2e1a473235/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_HOURLY/6d096ac3e70e54dd4a8612e17cfc4b11/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_DAILY/e81850d62da64c8d1c67be309f136e23/0
> drwxrwxrwx. 2 ams hadoop 45 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/b43ff796de887197834ad62fdb612b59/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/b43ff796de887197834ad62fdb612b59/.tmp
> drwxrwxrwx. 2 ams hadoop 45 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/c8eadeb7dead8fda9729b8e9b10c4929/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/c8eadeb7dead8fda9729b8e9b10c4929/.tmp
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE_MINUTE/ca9f9754ae9ae4cdc3e1b0523eecc390/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 

[jira] [Commented] (HBASE-17712) Remove/Simplify the logic of RegionScannerImpl.handleFileNotFound

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893086#comment-15893086
 ] 

stack commented on HBASE-17712:
---

bq. This not Ted Yu's fault.

Not much interested in 'blame'; asking for help/insight... if any to be had. 
dropMemstoreContents is added by HBASE-16304 made of parts taken from elsewhere 
in HRegion (which called through to the old dropMemstoreContentsForSeqId).

How you suggest we exploit your findings? A test to prove no FNFE anymore? And 
if this is so, undo all the protections and guards against FNFE with their 
reopening of files? How can I help sir [~Apache9]?









> Remove/Simplify the logic of RegionScannerImpl.handleFileNotFound
> -
>
> Key: HBASE-17712
> URL: https://issues.apache.org/jira/browse/HBASE-17712
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> It is introduced in HBASE-13651 and the logic became much more complicated 
> after HBASE-16304 due to a dead lock issue. It is really tough as sequence id 
> is involved in and the method we called is used to serve secondary replica 
> originally which does not handle write.
> In fact, in 1.x release, the problem described in HBASE-13651 is gone. Now we 
> will write a compaction marker to WAL before deleting the compacted files. We 
> can only consider a RS as dead after its WAL files are all closed so if the 
> region has already been reassigned the compaction will fail as we can not 
> write out the compaction marker.
> So theoretically, if we still hit FileNotFound exception, it should be a 
> critical bug which means we may loss data. I do not think it is a good idea 
> to just eat the exception and refresh store files. Or even if we want to do 
> this, we can just refresh store files without dropping memstore contents. 
> This will also simplify the logic a lot.
> Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17721) Provide streaming APIs with SSL/TLS

2017-03-02 Thread Alex Araujo (JIRA)
Alex Araujo created HBASE-17721:
---

 Summary: Provide streaming APIs with SSL/TLS
 Key: HBASE-17721
 URL: https://issues.apache.org/jira/browse/HBASE-17721
 Project: HBase
  Issue Type: Umbrella
Reporter: Alex Araujo
Assignee: Alex Araujo
 Fix For: 2.0.0


Umbrella to add optional client/server streaming capabilities to HBase.
This would allow bandwidth to be used more efficiently for certain operations, 
and allow clients to use SSL/TLS for authentication and encryption.

Desired client/server scaffolding:
- HTTP/2 support
- Protocol negotiation (blocking vs streaming, auth, encryption, etc.)
- TLS/SSL support
- Streaming RPC support

Possibilities (and their tradeoffs):
- gRPC: Some initial work and discussion on HBASE-13467 (Prototype using GRPC 
as IPC mechanism)
-- Has most or all of the desired scaffolding
-- Adds additional g* dependencies. Compat story for g* dependencies not always 
ideal
- Custom HTTP/2 based client/server APIs
-- More control over compat story
-- Non-trivial to build scaffolding; might reinvent wheels along the way
- Others?

Related Jiras that might be rolled in as sub-tasks (or closed/replaced with new 
ones):
HBASE-17708 (Expose config to set two-way auth over TLS in HttpServer and add a 
test)
HBASE-8691 (High-Throughput Streaming Scan API)
HBASE-14899 (Create custom Streaming ReplicationEndpoint)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893067#comment-15893067
 ] 

Hadoop QA commented on HBASE-17717:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
1s {color} | {color:green} branch-1.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} branch-1.1 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} branch-1.1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} branch-1.1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} branch-1.1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 1s 
{color} | {color:red} hbase-client in branch-1.1 has 15 extant Findbugs 
warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s 
{color} | {color:red} hbase-client in branch-1.1 failed with JDK v1.8.0_121. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} branch-1.1 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
12m 2s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s 
{color} | {color:red} hbase-client in the patch failed with JDK v1.8.0_121. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 38s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
14s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 1s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8012383 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12855693/HBASE-17717.001.branch-1.1.patch
 |
| JIRA Issue | HBASE-17717 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 7700266db533 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/hbase.sh |
| git revision | branch-1.1 / e4ba586 |
| Default Java | 1.7.0_80 |
| Multi-JDK 

[jira] [Commented] (HBASE-16304) HRegion#RegionScannerImpl#handleFileNotFoundException may lead to deadlock when trying to obtain write lock on updatesLock

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893038#comment-15893038
 ] 

stack commented on HBASE-16304:
---

The comments don't answer his question which was:

bq. Why this is called only in doDelta ... dropMemstoreContents();

> HRegion#RegionScannerImpl#handleFileNotFoundException may lead to deadlock 
> when trying to obtain write lock on updatesLock
> --
>
> Key: HBASE-16304
> URL: https://issues.apache.org/jira/browse/HBASE-16304
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.2
>Reporter: mingmin xu
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.3
>
> Attachments: 16304.branch-1.2.v4.txt, 16304.branch-1.2.v5.txt, 
> 16304.branch-1.2.v5.txt, 16304.branch-1.v1.txt, 16304.v1.txt, 16304.v3.txt, 
> 16304.v4.txt, 16304.v4.txt, 16304.v5.txt, 16304.v6.txt, 16304.v7.txt
>
>
> here is my jvm stack:
> {code}
> 2016-07-29 16:36:56
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.72-b04 mixed mode):
> "Timer for 'HBase' metrics system" daemon prio=10 tid=0x7f205cf38000 
> nid=0xafa5 in Object.wait() [0x7f203b353000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.util.TimerThread.mainLoop(Timer.java:552)
>   - locked <0x00063503c790> (a java.util.TaskQueue)
>   at java.util.TimerThread.run(Timer.java:505)
> "Attach Listener" daemon prio=10 tid=0x7f205d017800 nid=0x1300 waiting on 
> condition [0x]
>java.lang.Thread.State: RUNNABLE
> "IPC Parameter Sending Thread #2" daemon prio=10 tid=0x7f205c7c4000 
> nid=0x4f1a waiting on condition [0x7f20362e1000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00066f996718> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "RS_LOG_REPLAY_OPS-hadoop-datanode-0042:16020-1" prio=10 
> tid=0x7f2054ec8000 nid=0x832d waiting on condition [0x7f2039a18000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00066ffb5950> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "RS_LOG_REPLAY_OPS-hadoop-datanode-0042:16020-0" prio=10 
> tid=0x7f20542ca800 nid=0x5a5d waiting on condition [0x7f2033bba000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00066ffb5950> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "hadoop-datanode-0042.corp.cootek.com,16020,1469690065288_ChoreService_2" 
> daemon prio=10 tid=0x7f205d0d4000 

[jira] [Updated] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17717:
---
Attachment: HBASE-17717.001.branch-1.1.patch

branch-1.1 patch.

> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.branch-1.1.patch, HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17717:
---
Fix Version/s: 1.4.0

> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure

2017-03-02 Thread Ben Lau (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Lau resolved HBASE-17720.
-
Resolution: Duplicate

> Possible bug in FlushSnapshotSubprocedure
> -
>
> Key: HBASE-17720
> URL: https://issues.apache.org/jira/browse/HBASE-17720
> Project: HBase
>  Issue Type: Bug
>  Components: dataloss, snapshots
>Reporter: Ben Lau
>
> I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that 
> it does not appear to explicitly handle a DroppedSnapshotException.  In the 
> primary codepath when flushing memstores, (see 
> MemStoreFlusher.flushRegion()), there is a try/catch for 
> DroppedSnapshotException that will abort the regionserver to replay WALs to 
> avoid data loss.  I don't see this in FlushSnapshotSubProcedure.  Is this an 
> accidental omission or is there a reason this isn't present?  
> I'm not too familiar with procedure V1 or V2.  I assume it is the case that 
> if a participant dies that all other participants will terminate any 
> outstanding operations for the procedure?  If so and if this lack of 
> RS.abort() for DroppedSnapshotException is a bug, then it can't be fixed 
> naively otherwise I assume a failed flush on 1 region server could cause a 
> cascade of RS abortions on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17704) Regions stuck in FAILED_OPEN when HDFS blocks are missing

2017-03-02 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892976#comment-15892976
 ] 

Andrew Purtell commented on HBASE-17704:


I agree. I didn't know about HBASE-16209. With an exponential backoff policy 
and a cap on max wait time (I see that patch has it) there's no reason not to 
keep retrying indefinitely. Even prior to that the old default of 10 attempts 
is too small. That wouldn't ride over some transient issues. At some point 
operator intervention is necessary anyway, but we can get paged by a 
region-in-transition-too-long alert to deal with it and there's no harm in 
having the AM retry until we tell it not to with unassign_region or similar. 

> Regions stuck in FAILED_OPEN when HDFS blocks are missing
> -
>
> Key: HBASE-17704
> URL: https://issues.apache.org/jira/browse/HBASE-17704
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.1.8
>Reporter: Mathias Herberts
>
> We recently experienced the loss of a whole rack (6 DNs + RS) in a 120 node 
> cluster. This lead to the regions which were present on the 6 RS which became 
> unavailable to be reassigned to live RSs. When attempting to open some of the 
> reassigned regions, some RS encountered missing blocks and issued "No live 
> nodes contain current block Block locations" putting the regions in state 
> FAILED_OPEN.
> Once the disappeared DNs went back online, the regions were left in 
> FAILED_OPEN, needing a restart of all the affected RSs to solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure

2017-03-02 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892973#comment-15892973
 ] 

Ben Lau commented on HBASE-17720:
-

Ah, thanks Jerry.  It looks like the fix was added in HBASE-13877 which was 
committed after the version of 0.98 we use.  The scenario (failure in ITBLL) is 
exactly what I ran into.  I had checked master to make sure this wasn't fixed, 
but I checked FlushSnapshotSubprocedure for the missing try/catch, not 
RegionServerSnapshotManager.  So this is a dup of HBASE-13877.  Will close, 
thanks Jerry.

> Possible bug in FlushSnapshotSubprocedure
> -
>
> Key: HBASE-17720
> URL: https://issues.apache.org/jira/browse/HBASE-17720
> Project: HBase
>  Issue Type: Bug
>  Components: dataloss, snapshots
>Reporter: Ben Lau
>
> I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that 
> it does not appear to explicitly handle a DroppedSnapshotException.  In the 
> primary codepath when flushing memstores, (see 
> MemStoreFlusher.flushRegion()), there is a try/catch for 
> DroppedSnapshotException that will abort the regionserver to replay WALs to 
> avoid data loss.  I don't see this in FlushSnapshotSubProcedure.  Is this an 
> accidental omission or is there a reason this isn't present?  
> I'm not too familiar with procedure V1 or V2.  I assume it is the case that 
> if a participant dies that all other participants will terminate any 
> outstanding operations for the procedure?  If so and if this lack of 
> RS.abort() for DroppedSnapshotException is a bug, then it can't be fixed 
> naively otherwise I assume a failed flush on 1 region server could cause a 
> cascade of RS abortions on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17717:
---
Release Note: In previous versions of HBase, the system intended to set a 
ZooKeeper ACL on all "sensitive" ZNodes for the user specified in the 
hbase.superuser configuration property. Unfortunately, the ACL was malformed 
which resulted in the hbase.superuser being unable to access the sensitive 
ZNodes that HBase creates. This JIRA issue fixes this bug. HBase will 
automatically correct the ACLs on start so users do not need to manually 
correct the ACLs.

> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892953#comment-15892953
 ] 

Josh Elser commented on HBASE-17717:


[~mantonov], [~busbey] for your respective up-coming release lines, this one 
might be good to include. tl;dr is that the ZK ACL we were setting for the 
{{hbase.superuser}} was incorrect (resulting in the hbase.superuser not 
actually getting access as intended).

> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17716) Formalize Scan Metric names

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892929#comment-15892929
 ] 

stack commented on HBASE-17716:
---

Thanks for chiming in [~samarthjain]

bq. The metric keys aren't exposed today unless someone goes and looks at the 
HBase source code.

Aren't they exported as JMX and in our metrics dump page? They are for all 
intensive purposes 'public'. We do not have the same rigor regards change as we 
do w/ our public APIs but there'd be pushback if a contributor tried a random 
rename.

Regards our not exporting a list other than in jsp or in jmx, thats a failing 
on our part. In the past, a thought was to put up an hbase instance as part of 
the build, crawl its exported metrics with descriptions and then dump out a 
page we could incorporate into our refguide. Doing this would shine a light on 
how poor our descriptions are currently I'd think at a minimum of how they need 
work.

I'm trying to understand the value of  the enum indirection. Is there a history 
of our randomly changing metric names out from under phoenix (other than at say 
key junctions such as a major release?). And if enum'ing has a value, should we 
do it for all metrics rather than just a subset as here?

bq. On a side note, I am not sure of the motivation behind exposing the 
setCounter() like methods in the ServerSideScanMetrics class.

Taking a quick look, it seems to be for serialization -- sending ScanMetrics 
over RPC. We should mark such methods as not for external use... our fault.

> Formalize Scan Metric names
> ---
>
> Key: HBASE-17716
> URL: https://issues.apache.org/jira/browse/HBASE-17716
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Reporter: Karan Mehta
>Assignee: Karan Mehta
>Priority: Minor
> Attachments: HBASE-17716.patch
>
>
> HBase provides various metrics through the API's exposed by ScanMetrics 
> class. 
> The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix 
> Metrics API. Currently these metrics are referred via hard-coded strings, 
> which are not formal and can break the Phoenix API. Hence we need to refactor 
> the code to assign enums for these metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17710) HBase in standalone mode creates directories with 777 permission

2017-03-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892891#comment-15892891
 ] 

Hudson commented on HBASE-17710:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2600 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2600/])
HBASE-17710 HBase in standalone mode creates directories with 777 (tedyu: rev 
0b3ecc5ee7b990ef2149b388ed4f2eef4c5b139a)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileWriter.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java


> HBase in standalone mode creates directories with 777 permission
> 
>
> Key: HBASE-17710
> URL: https://issues.apache.org/jira/browse/HBASE-17710
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.1.2
> Environment: HDP-2.5.3
>Reporter: Toshihiro Suzuki
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 17710.branch-1.v1.txt, 17710.branch-1.v2.txt, 
> 17710.branch-1.v2.txt, 17710.branch-1.v3.txt, 17710.v1.txt, 17710.v2.txt, 
> 17710.v3.txt, 17710.v4.txt, 17710.v5.txt
>
>
> HBase in standalone mode creates directories with 777 permission in 
> hbase.rootdir.
> Ambari metrics collector defaults to standalone mode.
> {code}
> # find /var/lib/ambari-metrics-collector/hbase -perm 777 -type d -exec ls -ld 
> {} \;
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/hbase/namespace/d0cca53847904f4b4add1caa0ce3a9af/info
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/hbase/backup/cbceb8fccd968b4b4583365d4dc6e377/meta
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/hbase/backup/cbceb8fccd968b4b4583365d4dc6e377/session
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.CATALOG/2f4ce2294cd21cecb58fd1aca5646144/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.SEQUENCE/0eb67274ece8a4a26cfeeef2c6d4cd37/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.SEQUENCE/aef86710a4005f98e2dc90675f2eb325/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.STATS/5b1d955e255e55979621214a7e4083b8/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/SYSTEM.FUNCTION/32c033735cf144bac5637de23f7f7dd0/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRICS_METADATA/e420dfa799742fe4516ad1e4deefb793/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/HOSTED_APPS_METADATA/110be63e2a9994121fc5b48d663daf2c/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/CONTAINER_METRICS/a103719f87e8430635abf51a7fe98637/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD/cdb1d032beb90e350ce309e5d383c78e/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD/294deab47187494e845a5199702b4d04/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_MINUTE/1a263b4fe068ef2db5ba1c3e45553354/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_MINUTE/48f94dfb0161d8a28f645d2e1a473235/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_HOURLY/6d096ac3e70e54dd4a8612e17cfc4b11/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_RECORD_DAILY/e81850d62da64c8d1c67be309f136e23/0
> drwxrwxrwx. 2 ams hadoop 45 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/b43ff796de887197834ad62fdb612b59/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/b43ff796de887197834ad62fdb612b59/.tmp
> drwxrwxrwx. 2 ams hadoop 45 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/c8eadeb7dead8fda9729b8e9b10c4929/0
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:21 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE/c8eadeb7dead8fda9729b8e9b10c4929/.tmp
> drwxrwxrwx. 2 ams hadoop 6 Mar  1 02:17 
> /var/lib/ambari-metrics-collector/hbase/data/default/METRIC_AGGREGATE_MINUTE/ca9f9754ae9ae4cdc3e1b0523eecc390/0
> 

[jira] [Commented] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure

2017-03-02 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892888#comment-15892888
 ] 

Jerry He commented on HBASE-17720:
--

It is handled in the upper level RegionServerSnapshotManager.

> Possible bug in FlushSnapshotSubprocedure
> -
>
> Key: HBASE-17720
> URL: https://issues.apache.org/jira/browse/HBASE-17720
> Project: HBase
>  Issue Type: Bug
>  Components: dataloss, snapshots
>Reporter: Ben Lau
>
> I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that 
> it does not appear to explicitly handle a DroppedSnapshotException.  In the 
> primary codepath when flushing memstores, (see 
> MemStoreFlusher.flushRegion()), there is a try/catch for 
> DroppedSnapshotException that will abort the regionserver to replay WALs to 
> avoid data loss.  I don't see this in FlushSnapshotSubProcedure.  Is this an 
> accidental omission or is there a reason this isn't present?  
> I'm not too familiar with procedure V1 or V2.  I assume it is the case that 
> if a participant dies that all other participants will terminate any 
> outstanding operations for the procedure?  If so and if this lack of 
> RS.abort() for DroppedSnapshotException is a bug, then it can't be fixed 
> naively otherwise I assume a failed flush on 1 region server could cause a 
> cascade of RS abortions on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17704) Regions stuck in FAILED_OPEN when HDFS blocks are missing

2017-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892883#comment-15892883
 ] 

stack commented on HBASE-17704:
---

Should we change the default in a subissue based off the [~ghelmling] 
reasoning? (Hello [~herberts] -- smile)

> Regions stuck in FAILED_OPEN when HDFS blocks are missing
> -
>
> Key: HBASE-17704
> URL: https://issues.apache.org/jira/browse/HBASE-17704
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.1.8
>Reporter: Mathias Herberts
>
> We recently experienced the loss of a whole rack (6 DNs + RS) in a 120 node 
> cluster. This lead to the regions which were present on the 6 RS which became 
> unavailable to be reassigned to live RSs. When attempting to open some of the 
> reassigned regions, some RS encountered missing blocks and issued "No live 
> nodes contain current block Block locations" putting the regions in state 
> FAILED_OPEN.
> Once the disappeared DNs went back online, the regions were left in 
> FAILED_OPEN, needing a restart of all the affected RSs to solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16755) Honor flush policy under global memstore pressure

2017-03-02 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892875#comment-15892875
 ] 

Enis Soztutar commented on HBASE-16755:
---

bq. This requires changing the LimitedPrivate Region api, and may not make 
sense to do for a point release.
This seems like an improvement, and not a bug. And the risk is non-trivial as 
per above. So, I would say it should not go to 1.3.1 anyway. It can be 1.4, and 
2.0. 

> Honor flush policy under global memstore pressure
> -
>
> Key: HBASE-16755
> URL: https://issues.apache.org/jira/browse/HBASE-16755
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Fix For: 1.3.1
>
> Attachments: HBASE-16755.v0.patch
>
>
> When global memstore reaches the low water mark, we pick the best flushable 
> region and flush all column families for it. This is a suboptimal approach in 
> the  sense that it leads to an unnecessarily high file creation rate and IO 
> amplification due to compactions. We should still try to honor the underlying 
> FlushPolicy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17704) Regions stuck in FAILED_OPEN when HDFS blocks are missing

2017-03-02 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892873#comment-15892873
 ] 

Gary Helmling commented on HBASE-17704:
---

Just to be clear, I'd also be in favor of changing the default for this config 
to Integer.MAX_VALUE for 1.4.0 and 2.0.0.  The current situation having 
FAILED_OPEN be a terminal state requiring operator intervention is pretty bad 
and seems unnecessary.

It could be that I'm missing something else that's necessary, but that seems 
like an appropriate fix for this issue.

> Regions stuck in FAILED_OPEN when HDFS blocks are missing
> -
>
> Key: HBASE-17704
> URL: https://issues.apache.org/jira/browse/HBASE-17704
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.1.8
>Reporter: Mathias Herberts
>
> We recently experienced the loss of a whole rack (6 DNs + RS) in a 120 node 
> cluster. This lead to the regions which were present on the 6 RS which became 
> unavailable to be reassigned to live RSs. When attempting to open some of the 
> reassigned regions, some RS encountered missing blocks and issued "No live 
> nodes contain current block Block locations" putting the regions in state 
> FAILED_OPEN.
> Once the disappeared DNs went back online, the regions were left in 
> FAILED_OPEN, needing a restart of all the affected RSs to solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892871#comment-15892871
 ] 

Enis Soztutar commented on HBASE-17717:
---

Needs a release note section, and maybe an incompatible flag just to raise 
attention. 

> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17717) Incorrect ZK ACL set for HBase superuser

2017-03-02 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892869#comment-15892869
 ] 

Enis Soztutar commented on HBASE-17717:
---

bq. Just to confirm: given the above, Enis Soztutar, you're still +1?
Indeed. Nice digging! 

> Incorrect ZK ACL set for HBase superuser
> 
>
> Key: HBASE-17717
> URL: https://issues.apache.org/jira/browse/HBASE-17717
> Project: HBase
>  Issue Type: Bug
>  Components: security, Zookeeper
>Reporter: Shreya Bhat
>Assignee: Josh Elser
> Fix For: 2.0.0, 1.3.1, 1.1.10, 1.2.6
>
> Attachments: HBASE-17717.001.patch
>
>
> Shreya was doing some testing of a deploy of HBase, verifying that the ZK 
> ACLs were actually set as we expect (yay, security).
> She noticed that, in some cases, we were seeing multiple ACLs for the same 
> user.
> {noformat}
> 'world,'anyone
> : r
> 'sasl,'hbase
> : cdrwa
> 'sasl,'hbase
> : cdrwa
> {noformat}
> After digging into this (and some insight from the mighty [~enis]), we 
> realized that this was happening because of an overridden value for 
> {{hbase.superuser}}. However, the ACL value doesn't match what we'd expect to 
> see (as hbase.superuser was set to {{cstm-hbase}}).
> After digging into this code, it seems like the {{auth}} ACL scheme in 
> ZooKeeper does not work as we expect.
> {code}
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> In the above, the {{"auth"}} scheme ignores any provided "subject" in the 
> {{Id}} object. It *only* considers the authentication of the current 
> connection. As such, our usage of this never actually sets the ACL for the 
> superuser correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17719) PreEmptive Fast Fail does not apply to scanners

2017-03-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892846#comment-15892846
 ] 

Ted Yu commented on HBASE-17719:


Your first patch applies to branch-1.2

You can name it 17719.branch-1.2.v1.patch so that QA applies it on correct 
branch.

But review starts with master branch patch.

> PreEmptive Fast Fail does not apply to scanners
> ---
>
> Key: HBASE-17719
> URL: https://issues.apache.org/jira/browse/HBASE-17719
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.2.0
>Reporter: James Moore
>Assignee: James Moore
> Attachments: HBASE_17719.patch
>
>
> on CDH 5.9.0 testing revealed that scanners do not leverage Pre-emptive fast 
> fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17719) PreEmptive Fast Fail does not apply to scanners

2017-03-02 Thread James Moore (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892820#comment-15892820
 ] 

James Moore commented on HBASE-17719:
-

will do [~ted_yu] for my reference is there a process to also have patches 
applied to earlier branches? we're likely going to loosely be on branch 1.2 for 
a while to come.  Ideally we'd love to have this patch in branch 1.2 as well as 
master.

thanks!

--James

> PreEmptive Fast Fail does not apply to scanners
> ---
>
> Key: HBASE-17719
> URL: https://issues.apache.org/jira/browse/HBASE-17719
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.2.0
>Reporter: James Moore
>Assignee: James Moore
> Attachments: HBASE_17719.patch
>
>
> on CDH 5.9.0 testing revealed that scanners do not leverage Pre-emptive fast 
> fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17716) Formalize Scan Metric names

2017-03-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892816#comment-15892816
 ] 

Samarth Jain commented on HBASE-17716:
--

[~saint@gmail.com] - the general idea behind this JIRA was to make sure 
that users of scan metrics like Phoenix can guard against the metric name 
getting changed behind the scenes. The metric keys aren't exposed today unless 
someone goes and looks at the HBase source code. Having a metric enum sort of 
formalizes the contract of the API instead of having plain strings.

On a side note, I am not sure of the motivation behind exposing the 
setCounter() like methods in the ServerSideScanMetrics class. Was it intended 
to be like a grab-bag where clients can add and update whatever metrics they 
would like to? If not, then we should really get rid of such methods, and 
simply initialize the backing map by creating counters for all the enums in the 
Metric enum.

> Formalize Scan Metric names
> ---
>
> Key: HBASE-17716
> URL: https://issues.apache.org/jira/browse/HBASE-17716
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Reporter: Karan Mehta
>Assignee: Karan Mehta
>Priority: Minor
> Attachments: HBASE-17716.patch
>
>
> HBase provides various metrics through the API's exposed by ScanMetrics 
> class. 
> The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix 
> Metrics API. Currently these metrics are referred via hard-coded strings, 
> which are not formal and can break the Phoenix API. Hence we need to refactor 
> the code to assign enums for these metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure

2017-03-02 Thread Ben Lau (JIRA)
Ben Lau created HBASE-17720:
---

 Summary: Possible bug in FlushSnapshotSubprocedure
 Key: HBASE-17720
 URL: https://issues.apache.org/jira/browse/HBASE-17720
 Project: HBase
  Issue Type: Bug
  Components: dataloss, snapshots
Reporter: Ben Lau


I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that 
it does not appear to explicitly handle a DroppedSnapshotException.  In the 
primary codepath when flushing memstores, (see MemStoreFlusher.flushRegion()), 
there is a try/catch for DroppedSnapshotException that will abort the 
regionserver to replay WALs to avoid data loss.  I don't see this in 
FlushSnapshotSubProcedure.  Is this an accidental omission or is there a reason 
this isn't present?  

I'm not too familiar with procedure V1 or V2.  I assume it is the case that if 
a participant dies that all other participants will terminate any outstanding 
operations for the procedure?  If so and if this lack of RS.abort() for 
DroppedSnapshotException is a bug, then it can't be fixed naively otherwise I 
assume a failed flush on 1 region server could cause a cascade of RS abortions 
on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16755) Honor flush policy under global memstore pressure

2017-03-02 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892771#comment-15892771
 ] 

Ashu Pachauri commented on HBASE-16755:
---

Just to add to my previous comment, 
bq. we can either pass down the "emergencyFlush" flag to the Region#flush
This requires changing the LimitedPrivate Region api, and may not make sense to 
do for a point release.

> Honor flush policy under global memstore pressure
> -
>
> Key: HBASE-16755
> URL: https://issues.apache.org/jira/browse/HBASE-16755
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Fix For: 1.3.1
>
> Attachments: HBASE-16755.v0.patch
>
>
> When global memstore reaches the low water mark, we pick the best flushable 
> region and flush all column families for it. This is a suboptimal approach in 
> the  sense that it leads to an unnecessarily high file creation rate and IO 
> amplification due to compactions. We should still try to honor the underlying 
> FlushPolicy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16755) Honor flush policy under global memstore pressure

2017-03-02 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892760#comment-15892760
 ] 

Ashu Pachauri commented on HBASE-16755:
---

[~Apache9]
{quote}
And there is another minor reason why we set force to true is that, we select 
regions based on their total memstore size. If we do not flush all the contents 
in memstore, then the best candidate may not be the one which has the maximum 
memstore size.
Maybe we could introduce new methods in FlushPolicy to find out the lfushable 
size and select regions based on that value. Of course this is only a nice to 
have, can do it later if anyone has interest.
{quote}
Makes sense.

bq. On the patch, as now we rely on the FlushPolicy to always return something 
to flush, maybe we should add some checks in the code? We may introduce new 
flush policies in the future. we need to make sure that they also follow the 
rule.
A flush policy that returns nothing to flush (given that the decision whether a 
region is flushable is made before querying the flush policy) sounds 
impractical to use. But, to protect against such scenarios, we can either pass 
down the "emergencyFlush" flag to the Region#flush which resorts to flushing 
all store (if flush policy returns none) in case of an emergency flush. Another 
option could be to return FlushResult.Result.CANNOT_FLUSH from 
HRegion#flushCache if flush policy returns no stores to flush for the region 
and the caller (MemstoreFlusher) can decide to retry with forceFlushAllStores 
under certain circumstances.

> Honor flush policy under global memstore pressure
> -
>
> Key: HBASE-16755
> URL: https://issues.apache.org/jira/browse/HBASE-16755
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Fix For: 1.3.1
>
> Attachments: HBASE-16755.v0.patch
>
>
> When global memstore reaches the low water mark, we pick the best flushable 
> region and flush all column families for it. This is a suboptimal approach in 
> the  sense that it leads to an unnecessarily high file creation rate and IO 
> amplification due to compactions. We should still try to honor the underlying 
> FlushPolicy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-02 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Release Note: There are now new table skew cost functions and table skew 
candidate generators in the stochastic load balancer to more evenly spread 
tables across the cluster. Table skew cost is computed per table, and the final 
table skew cost number is a weighted average of the maximum skew cost for a 
given table with the average skew cost across all tables. To configure how much 
weight the maximum skew cost for a single table should get, you can change 
"hbase.master.balancer.stochastic.maxTableSkewWeight" to a float between 0.0 
and 1.0, where 0.0 means the max table skew gets 0% of the weight and 1.0 means 
max table skew gets 100% of the weight. This value is useful if you want to 
strongly penalize any one table being skewed (even if all others are evenly 
balanced). We default this value to 0.0 because this works best for most cases 
in practice.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-02 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892745#comment-15892745
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


Just updated the release notes. Do these look good to you [~tedyu]?

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17719) PreEmptive Fast Fail does not apply to scanners

2017-03-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892741#comment-15892741
 ] 

Ted Yu commented on HBASE-17719:


Please generate patch for master branch.

> PreEmptive Fast Fail does not apply to scanners
> ---
>
> Key: HBASE-17719
> URL: https://issues.apache.org/jira/browse/HBASE-17719
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.2.0
>Reporter: James Moore
>Assignee: James Moore
> Attachments: HBASE_17719.patch
>
>
> on CDH 5.9.0 testing revealed that scanners do not leverage Pre-emptive fast 
> fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17719) PreEmptive Fast Fail does not apply to scanners

2017-03-02 Thread James Moore (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892702#comment-15892702
 ] 

James Moore commented on HBASE-17719:
-

Overall the issue is two fold 

1) ScannerCallableWithReplicas creates a new RPCFactory for every call, which 
will generate new interceptor factories and prevent the fast fail interceptor 
from building up a map of failures over time. I've made the modifications for 
clientScanner to pass along the rpcFactory to child scanners.

2) ScannerCallableWithReplicas/RetryingRPC does not extend RegionServerCallable 
as ScannerCallable does, and won't pass the appropriate instanceof checks in 
FastFailInterceptor.  I've made the necessary modifications to allow the 
Callables to pass the instanceof checks.

This patch is based off of branch-1.2 do I need to generate additional patch 
files for 1.3/2.0?

--James



> PreEmptive Fast Fail does not apply to scanners
> ---
>
> Key: HBASE-17719
> URL: https://issues.apache.org/jira/browse/HBASE-17719
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.2.0
>Reporter: James Moore
>Assignee: James Moore
> Attachments: HBASE_17719.patch
>
>
> on CDH 5.9.0 testing revealed that scanners do not leverage Pre-emptive fast 
> fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17719) PreEmptive Fast Fail does not apply to scanners

2017-03-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892697#comment-15892697
 ] 

Ted Yu commented on HBASE-17719:


The patch was generated based on 1.2

Can you attach patch for master branch ?

> PreEmptive Fast Fail does not apply to scanners
> ---
>
> Key: HBASE-17719
> URL: https://issues.apache.org/jira/browse/HBASE-17719
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.2.0
>Reporter: James Moore
>Assignee: James Moore
> Attachments: HBASE_17719.patch
>
>
> on CDH 5.9.0 testing revealed that scanners do not leverage Pre-emptive fast 
> fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17719) PreEmptive Fast Fail does not apply to scanners

2017-03-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892692#comment-15892692
 ] 

Hadoop QA commented on HBASE-17719:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} 
| {color:red} HBASE-17719 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12855662/HBASE_17719.patch |
| JIRA Issue | HBASE-17719 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5921/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> PreEmptive Fast Fail does not apply to scanners
> ---
>
> Key: HBASE-17719
> URL: https://issues.apache.org/jira/browse/HBASE-17719
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.2.0
>Reporter: James Moore
>Assignee: James Moore
> Attachments: HBASE_17719.patch
>
>
> on CDH 5.9.0 testing revealed that scanners do not leverage Pre-emptive fast 
> fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17719) PreEmptive Fast Fail does not apply to scanners

2017-03-02 Thread James Moore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Moore updated HBASE-17719:

Summary: PreEmptive Fast Fail does not apply to scanners  (was: Pre-Emptive 
Fast Fail does not apply to scanners)

> PreEmptive Fast Fail does not apply to scanners
> ---
>
> Key: HBASE-17719
> URL: https://issues.apache.org/jira/browse/HBASE-17719
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.2.0
>Reporter: James Moore
>Assignee: James Moore
> Attachments: HBASE_17719.patch
>
>
> on CDH 5.9.0 testing revealed that scanners do not leverage Pre-emptive fast 
> fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-17719) Pre-Emptive Fast Fail does not apply to scanners

2017-03-02 Thread James Moore (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892688#comment-15892688
 ] 

James Moore edited comment on HBASE-17719 at 3/2/17 5:55 PM:
-

I've attached a patch file that corrects the issue and has scanners use fast 
fail.


was (Author: lumost):
I've attached a patch file that corrects the issue and has scanners obey fast 
fail.

> Pre-Emptive Fast Fail does not apply to scanners
> 
>
> Key: HBASE-17719
> URL: https://issues.apache.org/jira/browse/HBASE-17719
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.2.0
>Reporter: James Moore
>Assignee: James Moore
> Attachments: HBASE_17719.patch
>
>
> on CDH 5.9.0 testing revealed that scanners do not leverage Pre-emptive fast 
> fail.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >