[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869350#comment-13869350 ] Andrew Purtell commented on HBASE-10321: Another alternative is to also make CellCodec incapable of handling tags but backwards compatible also, and add *another* codec which can handle tags. Call it CellCodecV2 or whatever. CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869349#comment-13869349 ] Andrew Purtell commented on HBASE-10321: If KVCodec is the default and does not have a backwards compatability problem, then doesn't that solve the issue? CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10327) remove(K, V) of type PoolMapK,V has the same erasure
[ https://issues.apache.org/jira/browse/HBASE-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-10327. Resolution: Duplicate remove(K, V) of type PoolMapK,V has the same erasure -- Key: HBASE-10327 URL: https://issues.apache.org/jira/browse/HBASE-10327 Project: HBase Issue Type: Bug Affects Versions: 0.99.0 Reporter: Eric Charles Attachments: HBASE-10327.patch I keep getting red cross in my eclipse, whatever the jdk (jdk6, jdk7, jdk8) Name clash: The method remove(K, V) of type PoolMapK,V has the same erasure as remove(Object, Object) of type MapK,V but does not override it maybe related to HBASE-10030 The solution I have is simply removing the deprecated method, and everything is fine. I am not sure of the backwards compatibility here. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10322) Strip tags from KV while sending back to client on reads
[ https://issues.apache.org/jira/browse/HBASE-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869356#comment-13869356 ] Andrew Purtell commented on HBASE-10322: +1 Strip tags from KV while sending back to client on reads Key: HBASE-10322 URL: https://issues.apache.org/jira/browse/HBASE-10322 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10322.patch Right now we have some inconsistency wrt sending back tags on read. We do this in scan when using Java client(Codec based cell block encoding). But during a Get operation or when a pure PB based Scan comes we are not sending back the tags. So any of the below fix we have to do 1. Send back tags in missing cases also. But sending back visibility expression/ cell ACL is not correct. 2. Don't send back tags in any case. This will a problem when a tool like ExportTool use the scan to export the table data. We will miss exporting the cell visibility/ACL. 3. Send back tags based on some condition. It has to be per scan basis. Simplest way is pass some kind of attribute in Scan which says whether to send back tags or not. But believing some thing what scan specifies might not be correct IMO. Then comes the way of checking the user who is doing the scan. When a HBase super user doing the scan then only send back tags. So when a case comes like Export Tool's the execution should happen from a super user. So IMO we should go with #3. Patch coming soon. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869360#comment-13869360 ] chendihao commented on HBASE-10274: --- Thank [~lhofhansl] to resolve HBASE-10306 and please commit this by the way. MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers --- Key: HBASE-10274 URL: https://issues.apache.org/jira/browse/HBASE-10274 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Assignee: chendihao Priority: Minor Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch HBASE-6820 points out the problem but not fix completely. killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-10326: --- Status: Open (was: Patch Available) Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-10326: --- Status: Patch Available (was: Open) Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-10326: --- Attachment: HBASE-10326_1.patch Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869368#comment-13869368 ] Anoop Sam John commented on HBASE-10321: bq.and add another codec which can handle tags Looks good to me. Will make a patch which includes CellCodecV2. CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10282) We can't assure that the first ZK server is active server in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869370#comment-13869370 ] chendihao commented on HBASE-10282: --- The function {{killCurrentActiveZooKeeperServer()}} and {{killOneBackupZooKeeperServer()}} are nonsense because of this. I think [~liyin] treated the first zk server as the leader but we can't make sure of that. So should we rename {{activeZKServerIndex}} into {{firstZKServerIndex}} and combining these two functions into {{killFirstZooKeeperServer()}}(hard to know which one is actual leader)? Need more people to discuss it. [~enis] [~stack] We can't assure that the first ZK server is active server in MiniZooKeeperCluster - Key: HBASE-10282 URL: https://issues.apache.org/jira/browse/HBASE-10282 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Priority: Minor Thanks to HBASE-3052, we're able to run multiple zk servers in minicluster. However, It's confusing to keep the variable activeZKServerIndex as zero and assure the first zk server is always the active one. I think returning the first sever's client port is for testing and it seems that we can directly return the first item of the list. Anyway, the concept of active here is not the same as zk's. It's confusing when I read the code so I think we should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869378#comment-13869378 ] ramkrishna.s.vasudevan commented on HBASE-10321: +1 on CellCodecV2. CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10321: --- Attachment: HBASE-10321_V2.patch CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869386#comment-13869386 ] ramkrishna.s.vasudevan commented on HBASE-10321: +1 on patch. LGTM. CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869396#comment-13869396 ] chendihao commented on HBASE-10283: --- There're two solutions for this. The first one is allowing to set different zk ports for HBase(generic but contrary to original design). And the next one is adding extra code in ZKConf to support multiple ports for MiniZooKeeperCluster. I prefer the last one and try to reduce the change of code. MiniZooKeeperCluster can't be used for zk failover test before it's fixed. Can [~enis] help to review this? Client can't connect with all the running zk servers in MiniZooKeeperCluster Key: HBASE-10283 URL: https://issues.apache.org/jira/browse/HBASE-10283 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving. It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer. {noformat} 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227 .. 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 (then it throws exceptions..) {noformat} The log is kind of problematic because it always show Started MiniZK Cluster and connect 1 ZK server but actually there're three zk servers. Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, localhost, and different ports. But HBase self force to use the same port for each zk server and ZKConfig will ignore the other two servers which have the same host name. MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10030) [JDK8] Erasure of PoolMap#remove(K,V) conflicts with superclass method
[ https://issues.apache.org/jira/browse/HBASE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869394#comment-13869394 ] Eric Charles commented on HBASE-10030: -- It works fine with jdk8 via mvn cli, but gives compilation issue in eclipse (not the first time eclipse disagrees with mvn-cli). Only me? [JDK8] Erasure of PoolMap#remove(K,V) conflicts with superclass method -- Key: HBASE-10030 URL: https://issues.apache.org/jira/browse/HBASE-10030 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 0.98.0 Attachments: 10030.patch On JDK 8, the erasure of PoolMap#remove(K,V) conflicts with superclass method remove(Object,Object). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869398#comment-13869398 ] Anoop Sam John commented on HBASE-10321: Thanks all for reviews. Will commit tonight IST unless objections. CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869404#comment-13869404 ] Hadoop QA commented on HBASE-10326: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622599/HBASE-10326_1.patch against trunk revision . ATTACHMENT ID: 12622599 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8401//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8401//console This message is automatically generated. Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10227: - Attachment: HBASE-10227-trunk_v0.patch the fix is as below: 1. persist mvcc in HLog (in WALEdit) 2. never set KeyValue's mvcc to 0 3. always(not conditionally) include mvcc in HFile 4. reinitialize region mvcc after replaying split HLog files to include the greater ones in the new stores resulted from replaying/flushing split HLog files -- to correctly recover the region's mvcc Note to step 4: since replaying split HLog files need access mvcc, so we can't intialize mvcc after replaying split HLog files, reinitializing it to the final correct one is ok after replaying is done. An alternative fix is to add and use a new internalFlushcache method for replaying split HLog files which doesn't access mvcc(it's ok since when replaying split HLog files, it's impossible there is in-progress transaction/write not committed to HLog--no write to HLog during replaying split HLog files) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10227-trunk_v0.patch When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869408#comment-13869408 ] Anoop Sam John commented on HBASE-10326: Patch looks good Ram. Pls correct the white spaces introduced after checkIfScanOrGetFromSuperUser private method. {code} +HTable acl = new HTable(conf, AccessControlLists.ACL_TABLE_NAME); +try { + BlockingRpcChannel service = acl.coprocessorService(tableName.getName()); + AccessControlService.BlockingInterface protocol = AccessControlService + .newBlockingStub(service); + ProtobufUtil.grant(protocol, NORMAL_USER2.getShortName(), tableName, null, null, + Permission.Action.READ); +} finally { + acl.close(); +} {code} Instead can use AccessControlClient#grant ? This code is repeated in tests.. Thanks for the patch. Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869426#comment-13869426 ] Hadoop QA commented on HBASE-10321: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622602/HBASE-10321_V2.patch against trunk revision . ATTACHMENT ID: 12622602 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8402//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8402//console This message is automatically generated. CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10325) Unknown option or illegal argument:-XX:OnOutOfMemoryError=kill -9 %p
[ https://issues.apache.org/jira/browse/HBASE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869445#comment-13869445 ] chillon_m commented on HBASE-10325: --- jrockit Unknown option or illegal argument:-XX:OnOutOfMemoryError=kill -9 %p Key: HBASE-10325 URL: https://issues.apache.org/jira/browse/HBASE-10325 Project: HBase Issue Type: Bug Affects Versions: 0.96.1.1 Reporter: chillon_m Unknown option or illegal argument: -XX:OnOutOfMemoryError=kill -9 %p. Please check for incorrect spelling or review documentation of startup options. Could not create the Java virtual machine. starting master, logging to /home/hadoop/hbase-0.96.1.1-hadoop2/logs/hbase-hadoop-master-namenode0.hadoop.out Unknown option or illegal argument: -XX:OnOutOfMemoryError=kill -9 %p. Please check for incorrect spelling or review documentation of startup options -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10227: --- Status: Patch Available (was: Open) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10227-trunk_v0.patch When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
Feng Honghua created HBASE-10329: Summary: Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Reporter: Feng Honghua Assignee: Feng Honghua Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number clients, [~stack] encountered a NPE while doing the test where null writer occues in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fix this by adding a 'if (writer != null)' to protect the sync operation) I always wonder why the write can be null in AsyncSyncer and whether it's safe to fix by just adding a null check before doing sync, as [~stack] did. After some dig and analysis, I find out the case where AsyncSyncer can encounter null writer, it is as below: 1. t1: AsyncWriter append writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter append writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock which prevents further writes entering pendingWrites, and then waits for all items(till 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200, it also help sync =100 as a whole 5. t5: rollWriter close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds writer is null... before rollWriter set writer to the newly created Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it uses writer as well. This is the same reason as why null writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already by synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user get successful write response but can't read out the writes, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao reassigned HBASE-10283: - Assignee: chendihao Client can't connect with all the running zk servers in MiniZooKeeperCluster Key: HBASE-10283 URL: https://issues.apache.org/jira/browse/HBASE-10283 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Assignee: chendihao Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving. It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer. {noformat} 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227 .. 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 (then it throws exceptions..) {noformat} The log is kind of problematic because it always show Started MiniZK Cluster and connect 1 ZK server but actually there're three zk servers. Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, localhost, and different ports. But HBase self force to use the same port for each zk server and ZKConfig will ignore the other two servers which have the same host name. MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10329: - Description: Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null writer occues in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fix this by adding a 'if (writer != null)' to protect the sync operation) I always wonder why the write can be null in AsyncSyncer and whether it's safe to fix by just adding a null check before doing sync, as [~stack] did. After some dig and analysis, I find out the case where AsyncSyncer can encounter null writer, it is as below: 1. t1: AsyncWriter append writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter append writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock which prevents further writes entering pendingWrites, and then waits for all items(till 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200, it also help sync =100 as a whole 5. t5: rollWriter close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds writer is null... before rollWriter set writer to the newly created Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it uses writer as well. This is the same reason as why null writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already by synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user get successful write response but can't read out the writes, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block was: Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number clients, [~stack] encountered a NPE while doing the test where null writer occues in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fix this by adding a 'if (writer != null)' to protect the sync operation) I always wonder why the write can be null in AsyncSyncer and whether it's safe to fix by just adding a null check before doing sync, as [~stack] did. After some dig and analysis, I find out the case where AsyncSyncer can encounter null writer, it is as below: 1. t1: AsyncWriter append writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter append writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock which prevents further writes entering pendingWrites, and then waits for all items(till 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200, it also help sync =100 as a whole 5. t5: rollWriter close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds writer is null... before rollWriter set writer to the newly created Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it
[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10329: - Attachment: HBASE-10329-trunk_v0.patch patch is attached and ping [~stack] :-) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number clients, [~stack] encountered a NPE while doing the test where null writer occues in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fix this by adding a 'if (writer != null)' to protect the sync operation) I always wonder why the write can be null in AsyncSyncer and whether it's safe to fix by just adding a null check before doing sync, as [~stack] did. After some dig and analysis, I find out the case where AsyncSyncer can encounter null writer, it is as below: 1. t1: AsyncWriter append writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter append writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock which prevents further writes entering pendingWrites, and then waits for all items(till 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200, it also help sync =100 as a whole 5. t5: rollWriter close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds writer is null... before rollWriter set writer to the newly created Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it uses writer as well. This is the same reason as why null writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already by synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user get successful write response but can't read out the writes, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10329: - Description: Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter pendingWrites, and then waits for all items(= 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 as a whole) 5. t5: rollWriter now can close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter sets writer to the newly rolled Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it also uses writer. This is the same reason as why null-writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user gets successful write response but can't read out the writes after getting the successful write response, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block, otherwise it can't be selected by AsyncWriter to do sync was: Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null writer occues in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fix this by adding a 'if (writer != null)' to protect the sync operation) I always wonder why the write can be null in AsyncSyncer and whether it's safe to fix by just adding a null check before doing sync, as [~stack] did. After some dig and analysis, I find out the case where AsyncSyncer can encounter null writer, it is as below: 1. t1: AsyncWriter append writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter append writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock which prevents further writes entering pendingWrites, and then waits for all items(till 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200, it also help sync =100 as a whole 5. t5: rollWriter close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds writer is null... before rollWriter set writer to the newly created Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical
[jira] [Updated] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster
[ https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chendihao updated HBASE-10283: -- Attachment: HBASE-10283-0.94-v1.patch patch for 0.94 Client can't connect with all the running zk servers in MiniZooKeeperCluster Key: HBASE-10283 URL: https://issues.apache.org/jira/browse/HBASE-10283 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Assignee: chendihao Attachments: HBASE-10283-0.94-v1.patch Refer to HBASE-3052, multiple zk servers can run together in minicluster. The problem is that client can only connect with the first zk server and if you kill the first one, it fails to access the cluster even though other zk servers are serving. It's easy to repro. Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you construct the zk client, it can't connect with the zk cluster for any way. Here is the simple log you can refer. {noformat} 2014-01-03 12:06:58,625 INFO [main] zookeeper.MiniZooKeeperCluster(194): Started MiniZK Cluster and connect 1 ZK server on client port: 55227 .. 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(264): Kill the current active ZK servers in the cluster on client port: 55227 2014-01-03 12:06:59,134 INFO [main] zookeeper.MiniZooKeeperCluster(272): Activate a backup zk server in the cluster on client port: 55228 2014-01-03 12:06:59,366 INFO [main-EventThread] zookeeper.ZooKeeper(434): Initiating client connection, connectString=localhost:55227 sessionTimeout=3000 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118 (then it throws exceptions..) {noformat} The log is kind of problematic because it always show Started MiniZK Cluster and connect 1 ZK server but actually there're three zk servers. Looking deeply we find that the client is still trying to connect with the dead zk server's port. When I print out the zkQuorum it used, only the first zk server's hostport is there and it will not change no matter you kill the server or not. The reason for this is in ZKConfig which will convert HBase settings into zk's. MiniZooKeeperCluster create three servers with the same host name, localhost, and different ports. But HBase self force to use the same port for each zk server and ZKConfig will ignore the other two servers which have the same host name. MiniZooKeeperCluster works improperly before we fix this. The bug is not found because we never test whether HBase works or not if we kill the zk active or backup servers in ut. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10329: --- Status: Patch Available (was: Open) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter pendingWrites, and then waits for all items(= 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 as a whole) 5. t5: rollWriter now can close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter sets writer to the newly rolled Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it also uses writer. This is the same reason as why null-writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user gets successful write response but can't read out the writes after getting the successful write response, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block, otherwise it can't be selected by AsyncWriter to do sync -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869573#comment-13869573 ] Hadoop QA commented on HBASE-10227: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622605/HBASE-10227-trunk_v0.patch against trunk revision . ATTACHMENT ID: 12622605 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8403//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8403//console This message is automatically generated. When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10227-trunk_v0.patch When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869578#comment-13869578 ] Samir Ahmic commented on HBASE-7386: Update, with [HBASE-10310] fixed master---backupmaster failover time is 4s when cluster is controlled with supervisor and 3s when is controlled with standard scripts. Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6581) Build with hadoop.profile=3.0
[ https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869588#comment-13869588 ] Eric Charles commented on HBASE-6581: - The second issue mentioned above (npe) is fixed with HDFS-5760. Build with hadoop.profile=3.0 - Key: HBASE-6581 URL: https://issues.apache.org/jira/browse/HBASE-6581 Project: HBase Issue Type: Bug Reporter: Eric Charles Assignee: Eric Charles Fix For: 0.98.1 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, HBASE-6581-5.patch, HBASE-6581-6.patch, HBASE-6581-7.patch, HBASE-6581.diff, HBASE-6581.diff Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT instead of 3.0.0-SNAPSHOT in hbase-common). I can provide a patch that would move most of hadoop dependencies in their respective profiles and will define the correct hadoop deps in the 3.0 profile. Please tell me if that's ok to go this way. Thx, Eric [1] $ mvn clean install -Dhadoop.profile=3.0 [INFO] Scanning for projects... [ERROR] The build could not read 3 projects - [Help 1] [ERROR] [ERROR] The project org.apache.hbase:hbase-server:0.95-SNAPSHOT (/d/hbase.svn/hbase-server/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21 [ERROR] [ERROR] The project org.apache.hbase:hbase-common:0.95-SNAPSHOT (/d/hbase.svn/hbase-common/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21 [ERROR] [ERROR] The project org.apache.hbase:hbase-it:0.95-SNAPSHOT (/d/hbase.svn/hbase-it/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21 [ERROR] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10123) Change default ports; move them out of linux ephemeral port range
[ https://issues.apache.org/jira/browse/HBASE-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-10123: --- Priority: Critical (was: Major) Affects Version/s: 0.96.1.1 Fix Version/s: 0.98.0 Bumping priority and adding fix version so we try to get this into 0.98. Change default ports; move them out of linux ephemeral port range - Key: HBASE-10123 URL: https://issues.apache.org/jira/browse/HBASE-10123 Project: HBase Issue Type: Bug Affects Versions: 0.96.1.1 Reporter: stack Priority: Critical Fix For: 0.98.0 Our defaults clash w/ the range linux assigns itself for creating come-and-go ephemeral ports; likely in our history we've clashed w/ a random, short-lived process. While easy to change the defaults, we should just ship w/ defaults that make sense. We could host ourselves up into the 7 or 8k range. See http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869629#comment-13869629 ] Nicolas Liochon commented on HBASE-7386: Thanks a lot for the fix of HBASE-10310, Samir. I went through your patch. It's a difficult read when you don't know supervisor ;-). The definition of 'PROCESS_STATE_UNKNOWN' is a little scary (as we kill the region server when we reach this state). There are some typos ('Test is supevisored installed' instead of supevisord). I'm not sure about stuff like 'subprocess.call('/bin/mail -s HBASE_PROCESS_EVENT %s %s'%(email, tmp_file), shell=True)': seems machine dependent, there is no /bin/mail on my ubuntu desktop. Do we have to use python? It would be good to have a review from someone who knows supervisor... As well, this should be documented in the hbase reference guide imho. Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10330) TableInputFormat/TableRecordReaderImpl leaks HTable
G G created HBASE-10330: --- Summary: TableInputFormat/TableRecordReaderImpl leaks HTable Key: HBASE-10330 URL: https://issues.apache.org/jira/browse/HBASE-10330 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: G G Priority: Critical As far as I can tell, TableInputFormat creates an instance of HTable which is used by TableRecordReaderImpl. However TableRecordReaderImpl.close() only closes the scanner, not the table. In turn the HTable's HConnection's reference count is never decreased which leads to leaking HConnections. TableOutputFormat might have a similar bug. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869634#comment-13869634 ] Hadoop QA commented on HBASE-10329: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622617/HBASE-10329-trunk_v0.patch against trunk revision . ATTACHMENT ID: 12622617 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8404//console This message is automatically generated. Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter
[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869643#comment-13869643 ] Himanshu Vashishtha commented on HBASE-10329: - The explanation makes total sense… +1. Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter pendingWrites, and then waits for all items(= 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 as a whole) 5. t5: rollWriter now can close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter sets writer to the newly rolled Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it also uses writer. This is the same reason as why null-writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user gets successful write response but can't read out the writes after getting the successful write response, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block, otherwise it can't be selected by AsyncWriter to do sync -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10320) Avoid ArrayList.iterator() in tight loops
[ https://issues.apache.org/jira/browse/HBASE-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869683#comment-13869683 ] Lars Hofhansl commented on HBASE-10320: --- Thanks Stack. I wonder whether the same is true for ArrayLists. That brings to another thought, the columns list is fixed size and never changed once created, so why have an ArrayList at all instead of an Array. Then we can use columns.length in the loop and get this optimization. Will try when I get some time next. Avoid ArrayList.iterator() in tight loops - Key: HBASE-10320 URL: https://issues.apache.org/jira/browse/HBASE-10320 Project: HBase Issue Type: Bug Components: Performance Reporter: Lars Hofhansl Attachments: 10320-0.94-v2.txt, 10320-0.94.txt I noticed that in a profiler (sampler) run ScanQueryMatcher.setRow(...) showed up at all. In turns out that the expensive part is iterating over the columns in ExcplicitColumnTracker.reset(). I did some microbenchmarks and found that {code} private ArrayListX l; ... for (int i=0; il.size(); i++) { X = l.get(i); ... } {code} Is twice as fast as: {code} private ArrayListX l; ... for (X : l) { ... } {code} The indexed version asymptotically approaches the iterator version, but even at 1m entries it is still faster. In my tight loop scans this provides for a 5% performance improvement overall when the ExcplicitColumnTracker is used. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869679#comment-13869679 ] Lars Hofhansl commented on HBASE-10274: --- [~stack], [~apurtell], I assume you want this (test stability fix) in 0.96/0.98. MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers --- Key: HBASE-10274 URL: https://issues.apache.org/jira/browse/HBASE-10274 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Assignee: chendihao Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch HBASE-6820 points out the problem but not fix completely. killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10274: -- Fix Version/s: 0.94.17 0.99.0 0.96.2 0.98.0 MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers --- Key: HBASE-10274 URL: https://issues.apache.org/jira/browse/HBASE-10274 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Assignee: chendihao Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch HBASE-6820 points out the problem but not fix completely. killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10329: -- Priority: Critical (was: Major) Marking critical. Thanks for digging in Feng. Makes sense. Slight clarification, we only need to fail writes with syncedTillHere txid = txidToSync, right? Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Attachments: HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter pendingWrites, and then waits for all items(= 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 as a whole) 5. t5: rollWriter now can close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter sets writer to the newly rolled Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it also uses writer. This is the same reason as why null-writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user gets successful write response but can't read out the writes after getting the successful write response, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block, otherwise it can't be selected by AsyncWriter to do sync -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10330) TableInputFormat/TableRecordReaderImpl leaks HTable
[ https://issues.apache.org/jira/browse/HBASE-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869694#comment-13869694 ] Lars Hofhansl commented on HBASE-10330: --- Is that the case in 0.94 as well? (I'll check) TableInputFormat/TableRecordReaderImpl leaks HTable --- Key: HBASE-10330 URL: https://issues.apache.org/jira/browse/HBASE-10330 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: G G Priority: Critical As far as I can tell, TableInputFormat creates an instance of HTable which is used by TableRecordReaderImpl. However TableRecordReaderImpl.close() only closes the scanner, not the table. In turn the HTable's HConnection's reference count is never decreased which leads to leaking HConnections. TableOutputFormat might have a similar bug. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10123) Change default ports; move them out of linux ephemeral port range
[ https://issues.apache.org/jira/browse/HBASE-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869700#comment-13869700 ] Andrew Purtell commented on HBASE-10123: I need a patch or this goes to 0.98.1 Change default ports; move them out of linux ephemeral port range - Key: HBASE-10123 URL: https://issues.apache.org/jira/browse/HBASE-10123 Project: HBase Issue Type: Bug Affects Versions: 0.96.1.1 Reporter: stack Priority: Critical Fix For: 0.98.0 Our defaults clash w/ the range linux assigns itself for creating come-and-go ephemeral ports; likely in our history we've clashed w/ a random, short-lived process. While easy to change the defaults, we should just ship w/ defaults that make sense. We could host ourselves up into the 7 or 8k range. See http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10030) [JDK8] Erasure of PoolMap#remove(K,V) conflicts with superclass method
[ https://issues.apache.org/jira/browse/HBASE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869698#comment-13869698 ] Andrew Purtell commented on HBASE-10030: Possibly. Eclipse on JDK8 worked for me. I can check that again when I have a free moment. [JDK8] Erasure of PoolMap#remove(K,V) conflicts with superclass method -- Key: HBASE-10030 URL: https://issues.apache.org/jira/browse/HBASE-10030 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 0.98.0 Attachments: 10030.patch On JDK 8, the erasure of PoolMap#remove(K,V) conflicts with superclass method remove(Object,Object). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat
[ https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10323: --- Fix Version/s: 0.99.0 The 'mvn site' failure occurs in other QA runs as well. It was not caused by your patch. Auto detect data block encoding in HFileOutputFormat Key: HBASE-10323 URL: https://issues.apache.org/jira/browse/HBASE-10323 Project: HBase Issue Type: Improvement Reporter: Ishan Chhabra Assignee: Ishan Chhabra Fix For: 0.99.0 Attachments: HBASE_10323-0.94.15-v1.patch, HBASE_10323-0.94.15-v2.patch, HBASE_10323-0.94.15-v3.patch, HBASE_10323-trunk-v1.patch, HBASE_10323-trunk-v2.patch Currently, one has to specify the data block encoding of the table explicitly using the config parameter hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload load. This option is easily missed, not documented and also works differently than compression, block size and bloom filter type, which are auto detected. The solution would be to add support to auto detect datablock encoding similar to other parameters. The current patch does the following: 1. Automatically detects datablock encoding in HFileOutputFormat. 2. Keeps the legacy option of manually specifying the datablock encoding around as a method to override auto detections. 3. Moves string conf parsing to the start of the program so that it fails fast during starting up instead of failing during record writes. It also makes the internals of the program type safe. 4. Adds missing doc strings and unit tests for code serializing and deserializing config paramerters for bloom filer type, block size and datablock encoding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869708#comment-13869708 ] Andrew Purtell commented on HBASE-10321: +1 on patch V2 CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869711#comment-13869711 ] Andrew Purtell edited comment on HBASE-10326 at 1/13/14 5:21 PM: - bq. Instead can use AccessControlClient#grant ? This code is repeated in tests.. Or use the new grant/revoke methods in SecureTestUtils, which are designed for granting or revoking in tests. They do things only possible in miniclusters to insure the AC has propagated the grant to all caches first, to avoid flapping tests. Are the changes to TestVisibilityLabels needed? The test runs under the superuser implicitly right? There is no functional change though, would be fine to keep them. What do the new tests in TestVisibilityLabelsWithACL do? Comment, please. was (Author: apurtell): bq. Instead can use AccessControlClient#grant ? This code is repeated in tests.. Or use the new grant/revoke methods in SecureTestUtils methods for granting, which also insures the AC has propagated the grant to all caches first, to avoid racing tests. Are the changes to TestVisibilityLabels needed? The test runs under the superuser implicitly right? There is no functional change though, would be fine to keep them. What do the new tests in TestVisibilityLabelsWithACL do? Comment, please. Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869713#comment-13869713 ] Andrew Purtell commented on HBASE-10326: [~anoop.hbase], Ram mailed me that he is away this evening. I would be +1 for a commit of this patch without the test changes. What do you think? We can add the test changes later as an addendum or new JIRA. Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869711#comment-13869711 ] Andrew Purtell commented on HBASE-10326: bq. Instead can use AccessControlClient#grant ? This code is repeated in tests.. Or use the new grant/revoke methods in SecureTestUtils methods for granting, which also insures the AC has propagated the grant to all caches first, to avoid racing tests. Are the changes to TestVisibilityLabels needed? The test runs under the superuser implicitly right? There is no functional change though, would be fine to keep them. What do the new tests in TestVisibilityLabelsWithACL do? Comment, please. Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10321: --- Resolution: Fixed Release Note: A new Codec CellCodecV2 is added which can do all the work of CellCodec plus writing/reading Tags. CellCodec will not be able to handle tags. When one wants to use CellCodec and tags need to use CellCodecV2. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to Trunk and 0.98. Thanks for the reviews. CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869729#comment-13869729 ] Anoop Sam John commented on HBASE-10326: I will commit. Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869729#comment-13869729 ] Anoop Sam John edited comment on HBASE-10326 at 1/13/14 5:42 PM: - I will commit patch as it is now.. We can improve the tests later as you suggested. was (Author: anoop.hbase): I will commit. Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869745#comment-13869745 ] Andrew Purtell commented on HBASE-10274: One HadoopQA run was good, the other failed org.apache.hadoop.hbase.zookeeper.lock.TestZKInterProcessReadWriteLock. Has anyone tested if this changes makes our ZK unit tests flaky? MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers --- Key: HBASE-10274 URL: https://issues.apache.org/jira/browse/HBASE-10274 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Assignee: chendihao Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch HBASE-6820 points out the problem but not fix completely. killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
[ https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869752#comment-13869752 ] Andrew Purtell commented on HBASE-10274: Anyway, we can try it on 0.98. If tests do flake, I can revert and recommit to 0.98.1. MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers --- Key: HBASE-10274 URL: https://issues.apache.org/jira/browse/HBASE-10274 Project: HBase Issue Type: Bug Affects Versions: 0.94.3 Reporter: chendihao Assignee: chendihao Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch HBASE-6820 points out the problem but not fix completely. killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will shutdown the ZooKeeperServer and need to close ZKDatabase as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10326: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to Trunk and 0.98. Thanks for the patch Ram. Thanks for the review Andy. Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10304) Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
[ https://issues.apache.org/jira/browse/HBASE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869753#comment-13869753 ] Nick Dimiduk commented on HBASE-10304: -- [~jxiang] the only thing you needed to tweak for the first two variations was explicit inclusion of the hbase-config in $HADOOP_CLASSPATH ? Where else would the hadoop invocation pick up hbase-site.xml? Adding hbase-config in this invocation method has always been required, right? What about launching the job using our bin/hbase script? Do you see the same IllegalAccessError when launching the fat jar that way? Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString Key: HBASE-10304 URL: https://issues.apache.org/jira/browse/HBASE-10304 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.98.0, 0.96.1.1 Reporter: stack Priority: Blocker Fix For: 0.98.0 Attachments: hbase-10304_not_tested.patch, jobjar.xml (Jimmy has been working on this one internally. I'm just the messenger raising this critical issue upstream). So, if you make job jar and bundle up hbase inside in it because you want to access hbase from your mapreduce task, the deploy of the job jar to the cluster fails with: {code} 14/01/05 08:59:19 INFO Configuration.deprecation: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 14/01/05 08:59:19 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum Exception in thread main java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:792) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} So, ZCLBS is a hack. This class is in the hbase-protocol module. It is in the com.google.protobuf package. All is well and good usually. But when we make a job jar and bundle up hbase inside it, our 'trick' breaks. RunJar makes a new class loader to run the job jar. This URLCLassLoader 'attaches' all the jars and classes that are in jobjar so they can be found when it does to do a lookup only Classloaders work by always delegating to their parent first (unless you are a WAR file in a container where delegation is 'off' for the most part) and in this case, the parent classloader will have access to a pb jar since pb is in the hadoop CLASSPATH. So,
[jira] [Updated] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10326: --- Component/s: security Release Note: HBase super user can (any user who is having system visibility label) read back all the cells irrespective of visibility expression applied for cells. Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10330) TableInputFormat/TableRecordReaderImpl leaks HTable
[ https://issues.apache.org/jira/browse/HBASE-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10330: --- Fix Version/s: 0.99.0 0.96.2 0.98.0 Seems like it would have a straightforward fix. TableInputFormat/TableRecordReaderImpl leaks HTable --- Key: HBASE-10330 URL: https://issues.apache.org/jira/browse/HBASE-10330 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: G G Priority: Critical Fix For: 0.98.0, 0.96.2, 0.99.0 As far as I can tell, TableInputFormat creates an instance of HTable which is used by TableRecordReaderImpl. However TableRecordReaderImpl.close() only closes the scanner, not the table. In turn the HTable's HConnection's reference count is never decreased which leads to leaking HConnections. TableOutputFormat might have a similar bug. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10277) refactor AsyncProcess
[ https://issues.apache.org/jira/browse/HBASE-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869762#comment-13869762 ] Sergey Shelukhin commented on HBASE-10277: -- I started the mailing thread on this. refactor AsyncProcess - Key: HBASE-10277 URL: https://issues.apache.org/jira/browse/HBASE-10277 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10277.patch AsyncProcess currently has two patterns of usage, one from HTable flush w/o callback and with reuse, and one from HCM/HTable batch call, with callback and w/o reuse. In the former case (but not the latter), it also does some throttling of actions on initial submit call, limiting the number of outstanding actions per server. The latter case is relatively straightforward. The former appears to be error prone due to reuse - if, as javadoc claims should be safe, multiple submit calls are performed without waiting for the async part of the previous call to finish, fields like hasError become ambiguous and can be used for the wrong call; callback for success/failure is called based on original index of an action in submitted list, but with only one callback supplied to AP in ctor it's not clear to which submit call the index belongs, if several are outstanding. I was going to add support for HBASE-10070 to AP, and found that it might be difficult to do cleanly. It would be nice to normalize AP usage patterns; in particular, separate the global part (load tracking) from per-submit-call part. Per-submit part can more conveniently track stuff like initialActions, mapping of indexes and retry information, that is currently passed around the method calls. I am not sure yet, but maybe sending of the original index to server in ClientProtos.MultiAction can also be avoided. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10320) Avoid ArrayList.iterator() in tight loops
[ https://issues.apache.org/jira/browse/HBASE-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869768#comment-13869768 ] Andrew Purtell commented on HBASE-10320: Is this a 0.94 only change or applicable everywhere? (The latter, right?) Avoid ArrayList.iterator() in tight loops - Key: HBASE-10320 URL: https://issues.apache.org/jira/browse/HBASE-10320 Project: HBase Issue Type: Bug Components: Performance Reporter: Lars Hofhansl Attachments: 10320-0.94-v2.txt, 10320-0.94.txt I noticed that in a profiler (sampler) run ScanQueryMatcher.setRow(...) showed up at all. In turns out that the expensive part is iterating over the columns in ExcplicitColumnTracker.reset(). I did some microbenchmarks and found that {code} private ArrayListX l; ... for (int i=0; il.size(); i++) { X = l.get(i); ... } {code} Is twice as fast as: {code} private ArrayListX l; ... for (X : l) { ... } {code} The indexed version asymptotically approaches the iterator version, but even at 1m entries it is still faster. In my tight loop scans this provides for a 5% performance improvement overall when the ExcplicitColumnTracker is used. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869766#comment-13869766 ] Andrew Purtell commented on HBASE-10326: Then I will fix the tests now. HBASE-10331 Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10331) Insure security tests use SecureTestUtil methods for grants
Andrew Purtell created HBASE-10331: -- Summary: Insure security tests use SecureTestUtil methods for grants Key: HBASE-10331 URL: https://issues.apache.org/jira/browse/HBASE-10331 Project: HBase Issue Type: Improvement Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869784#comment-13869784 ] Sergey Shelukhin commented on HBASE-10227: -- RB would be nice. Storing mvcc in store file always is an interesting option. However, it becomes unnecessary for most KVs after some time under current HBase assumptions (that storefiles can be compared, all KVs in one SF are older than all KVs in the other per seqId/mvcc). The only uses for mvcc in KV at that point is exact same key in the file, and scanners, but the latter need disappears after some time, see some later comments in HBASE-10244. Some minor comments on the patch: bq. mvcc.reinitialize(maxMemstoreTS + 1); is now called twice in the same place. With removal of usage performCompaction no longer needs smallestReadPoint. Also parameter might not be necessary in createWriterInTmp Ok this is major comment. bq. if (versionOrLength == VERSION_3) { Is it possible to add MVCCs from corresponding KVs to protobuf part, rather than expand WALEdit format? I think the proper way is actually to make mvcc serialization a first class part of KV, there's JIRA for that; but that might be too much for this patch, as it would require new HFile version. For now we can at least avoid more hard-to-maintain-compat stuff down the line. Already, it appears that old reader will not read V_3 correctly. When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10227-trunk_v0.patch When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10324) refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model
[ https://issues.apache.org/jira/browse/HBASE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869772#comment-13869772 ] Andrew Purtell commented on HBASE-10324: Let's commit this. It needs to go into 0.98 branch also because we incorporated FHH's log changes there too. refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model - Key: HBASE-10324 URL: https://issues.apache.org/jira/browse/HBASE-10324 Project: HBase Issue Type: Improvement Components: Client, regionserver Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10324-trunk_v0.patch, HBASE-10324-trunk_v1.patch, HBASE-10324-trunk_v2.patch By the new write thread model introduced by [HBASE-8755|https://issues.apache.org/jira/browse/HBASE-8755], some deferred-log-flush/Durability API/code/names should be change accordingly: 1. no timer-triggered deferred-log-flush since flush is always done by async threads, so configuration 'hbase.regionserver.optionallogflushinterval' is no longer needed 2. the async writer-syncer-notifier threads will always be triggered implicitly, this semantic is that it always holds that 'hbase.regionserver.optionallogflushinterval' 0, so deferredLogSyncDisabled in HRegion.java which affects durability behavior should always be false 3. what HTableDescriptor.isDeferredLogFlush really means is the write can return without waiting for the sync is done, so the interface name should be changed to isAsyncLogFlush/setAsyncLogFlush to reflect their real meaning -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10295) Refactor the replication implementation to eliminate permanent zk node
[ https://issues.apache.org/jira/browse/HBASE-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869785#comment-13869785 ] Andrew Purtell commented on HBASE-10295: Nice idea, +1 Refactor the replication implementation to eliminate permanent zk node --- Key: HBASE-10295 URL: https://issues.apache.org/jira/browse/HBASE-10295 Project: HBase Issue Type: Bug Components: Replication Reporter: Feng Honghua Assignee: Feng Honghua Fix For: 0.99.0 Though this is a broader and bigger change, it original motivation derives from [HBASE-8751|https://issues.apache.org/jira/browse/HBASE-8751]: the newly introduced per-peer tableCFs attribute should be treated the same way as the peer-state, which is a permanent sub-node under peer node but using permanent zk node is deemed as an incorrect practice. So let's refactor to eliminate the permanent zk node. And the HBASE-8751 can then align its newly introduced per-peer tableCFs attribute with this *correct* implementation theme. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10329: --- Affects Version/s: 0.98.0 Fix Version/s: 0.99.0 0.98.0 +1, please commit to trunk and 0.98 branch. Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Affects Versions: 0.98.0 Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter pendingWrites, and then waits for all items(= 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 as a whole) 5. t5: rollWriter now can close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter sets writer to the newly rolled Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it also uses writer. This is the same reason as why null-writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user gets successful write response but can't read out the writes after getting the successful write response, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block, otherwise it can't be selected by AsyncWriter to do sync -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10324) refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model
[ https://issues.apache.org/jira/browse/HBASE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869773#comment-13869773 ] Andrew Purtell commented on HBASE-10324: Nit: Remove the '@Deprecated' tags from the renamed methods. The deprecated methods have been effectively removed and replaced with a new API. refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model - Key: HBASE-10324 URL: https://issues.apache.org/jira/browse/HBASE-10324 Project: HBase Issue Type: Improvement Components: Client, regionserver Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10324-trunk_v0.patch, HBASE-10324-trunk_v1.patch, HBASE-10324-trunk_v2.patch By the new write thread model introduced by [HBASE-8755|https://issues.apache.org/jira/browse/HBASE-8755], some deferred-log-flush/Durability API/code/names should be change accordingly: 1. no timer-triggered deferred-log-flush since flush is always done by async threads, so configuration 'hbase.regionserver.optionallogflushinterval' is no longer needed 2. the async writer-syncer-notifier threads will always be triggered implicitly, this semantic is that it always holds that 'hbase.regionserver.optionallogflushinterval' 0, so deferredLogSyncDisabled in HRegion.java which affects durability behavior should always be false 3. what HTableDescriptor.isDeferredLogFlush really means is the write can return without waiting for the sync is done, so the interface name should be changed to isAsyncLogFlush/setAsyncLogFlush to reflect their real meaning -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-10324) refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model
[ https://issues.apache.org/jira/browse/HBASE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869773#comment-13869773 ] Andrew Purtell edited comment on HBASE-10324 at 1/13/14 6:10 PM: - Nit: Remove the '@Deprecated' tags from the renamed methods. The deprecated methods have been effectively removed and replaced with a new API. Edit: Committer, please also insure the deprecated methods appear as such in 0.96 branch. was (Author: apurtell): Nit: Remove the '@Deprecated' tags from the renamed methods. The deprecated methods have been effectively removed and replaced with a new API. refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model - Key: HBASE-10324 URL: https://issues.apache.org/jira/browse/HBASE-10324 Project: HBase Issue Type: Improvement Components: Client, regionserver Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10324-trunk_v0.patch, HBASE-10324-trunk_v1.patch, HBASE-10324-trunk_v2.patch By the new write thread model introduced by [HBASE-8755|https://issues.apache.org/jira/browse/HBASE-8755], some deferred-log-flush/Durability API/code/names should be change accordingly: 1. no timer-triggered deferred-log-flush since flush is always done by async threads, so configuration 'hbase.regionserver.optionallogflushinterval' is no longer needed 2. the async writer-syncer-notifier threads will always be triggered implicitly, this semantic is that it always holds that 'hbase.regionserver.optionallogflushinterval' 0, so deferredLogSyncDisabled in HRegion.java which affects durability behavior should always be false 3. what HTableDescriptor.isDeferredLogFlush really means is the write can return without waiting for the sync is done, so the interface name should be changed to isAsyncLogFlush/setAsyncLogFlush to reflect their real meaning -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6581) Build with hadoop.profile=3.0
[ https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869788#comment-13869788 ] Eric Charles commented on HBASE-6581: - and now I have looked at first issue related to TestReplicationHLogReaderManager. The test is working but is just slow and is probably killed by the build system. Some methods take 4 times more with the usage of the Method object. Strangely, TestHLog takes the same time - I will write a small blog post with more details. Bottom line: I would like to propose the commit of the changes related to the pom.xml (they were not easy to setup and I would prefer not loosing that work : it basicallyl introduces a hadoop3 module and nicely exclude the hadoop-core 1.1 and so on...). For the java class, we can further think to a another solution. If someone gives the green light for the poms, I will upload a new patch. Build with hadoop.profile=3.0 - Key: HBASE-6581 URL: https://issues.apache.org/jira/browse/HBASE-6581 Project: HBase Issue Type: Bug Reporter: Eric Charles Assignee: Eric Charles Fix For: 0.98.1 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, HBASE-6581-5.patch, HBASE-6581-6.patch, HBASE-6581-7.patch, HBASE-6581.diff, HBASE-6581.diff Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT instead of 3.0.0-SNAPSHOT in hbase-common). I can provide a patch that would move most of hadoop dependencies in their respective profiles and will define the correct hadoop deps in the 3.0 profile. Please tell me if that's ok to go this way. Thx, Eric [1] $ mvn clean install -Dhadoop.profile=3.0 [INFO] Scanning for projects... [ERROR] The build could not read 3 projects - [Help 1] [ERROR] [ERROR] The project org.apache.hbase:hbase-server:0.95-SNAPSHOT (/d/hbase.svn/hbase-server/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21 [ERROR] [ERROR] The project org.apache.hbase:hbase-common:0.95-SNAPSHOT (/d/hbase.svn/hbase-common/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21 [ERROR] [ERROR] The project org.apache.hbase:hbase-it:0.95-SNAPSHOT (/d/hbase.svn/hbase-it/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21 [ERROR] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10332) Missing .regioninfo file during daughter open processing
Andrew Purtell created HBASE-10332: -- Summary: Missing .regioninfo file during daughter open processing Key: HBASE-10332 URL: https://issues.apache.org/jira/browse/HBASE-10332 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Under cluster stress testing, there are a fair amount of warnings like this: {noformat} 2014-01-12 04:52:29,183 WARN [test-1,8120,1389467616661-daughterOpener=490a58c14b14a59e8d303d310684f0b0] regionserver.HRegionFileSystem: .regioninfo file not found for region: 490a58c14b14a59e8d303d310684f0b0 {noformat} This is from HRegionFileSystem#checkRegionInfoOnFilesystem, which catches a FileNotFoundException in this case and calls writeRegionInfoOnFilesystem to fix up the issue. Is this a bug in splitting? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10332) Missing .regioninfo file during daughter open processing
[ https://issues.apache.org/jira/browse/HBASE-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869811#comment-13869811 ] Andrew Purtell commented on HBASE-10332: Ping [~mbertozzi], author of the code in question. Missing .regioninfo file during daughter open processing Key: HBASE-10332 URL: https://issues.apache.org/jira/browse/HBASE-10332 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Under cluster stress testing, there are a fair amount of warnings like this: {noformat} 2014-01-12 04:52:29,183 WARN [test-1,8120,1389467616661-daughterOpener=490a58c14b14a59e8d303d310684f0b0] regionserver.HRegionFileSystem: .regioninfo file not found for region: 490a58c14b14a59e8d303d310684f0b0 {noformat} This is from HRegionFileSystem#checkRegionInfoOnFilesystem, which catches a FileNotFoundException in this case and calls writeRegionInfoOnFilesystem to fix up the issue. Is this a bug in splitting? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8547) Fix java.lang.RuntimeException: Cached an already cached block
[ https://issues.apache.org/jira/browse/HBASE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869817#comment-13869817 ] Andrew Purtell commented on HBASE-8547: --- {noformat} 2014-01-11 22:22:56,895 WARN [RpcServer.handler=1,port=8120] hfile.LruBlockCache: Cached an already cached block: a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926 cb:a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926. This is harmless and can happen in rare cases (see HBASE-8547) {noformat} 14 occurrences writing 1 billion keys with flushes every 30 seconds, indeed seems rare by observation. Just FYI. Fix java.lang.RuntimeException: Cached an already cached block -- Key: HBASE-8547 URL: https://issues.apache.org/jira/browse/HBASE-8547 Project: HBase Issue Type: Bug Components: io, regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: hbase-8547_v1-0.94.patch, hbase-8547_v1-0.94.patch, hbase-8547_v1.patch, hbase-8547_v2-0.94-reduced.patch, hbase-8547_v2-addendum2+3-0.94.patch, hbase-8547_v2-addendum2.patch, hbase-8547_v2-addendum2.patch, hbase-8547_v2-addendum3.patch, hbase-8547_v2-trunk.patch In one test, one of the region servers received the following on 0.94. Note HalfStoreFileReader in the stack trace. I think the root cause is that after the region is split, the mid point can be in the middle of the block (for store files that the mid point is not chosen from). Each half store file tries to load the half block and put it in the block cache. Since IdLock is instantiated per store file reader, they do not share the same IdLock instance, thus does not lock against each other effectively. {code} 2013-05-12 01:30:37,733 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:· java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:279) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:353) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:237) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3829) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3896) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3778) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3770) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2643) at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) {code} I can see two possible fixes: # Allow this kind of rare cases in LruBlockCache by not throwing an exception. # Move the lock instances to upper layer (possibly in CacheConfig), and let half hfile readers share the same IdLock implementation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-8547) Fix java.lang.RuntimeException: Cached an already cached block
[ https://issues.apache.org/jira/browse/HBASE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869817#comment-13869817 ] Andrew Purtell edited comment on HBASE-8547 at 1/13/14 6:44 PM: {noformat} 2014-01-11 22:22:56,895 WARN [RpcServer.handler=1,port=8120] hfile.LruBlockCache: Cached an already cached block: a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926 cb:a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926. This is harmless and can happen in rare cases (see HBASE-8547) {noformat} 14 occurrences on one RS writing 1 billion keys with flushes every 30 seconds, indeed seems rare by observation. Just FYI. was (Author: apurtell): {noformat} 2014-01-11 22:22:56,895 WARN [RpcServer.handler=1,port=8120] hfile.LruBlockCache: Cached an already cached block: a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926 cb:a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926. This is harmless and can happen in rare cases (see HBASE-8547) {noformat} 14 occurrences writing 1 billion keys with flushes every 30 seconds, indeed seems rare by observation. Just FYI. Fix java.lang.RuntimeException: Cached an already cached block -- Key: HBASE-8547 URL: https://issues.apache.org/jira/browse/HBASE-8547 Project: HBase Issue Type: Bug Components: io, regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: hbase-8547_v1-0.94.patch, hbase-8547_v1-0.94.patch, hbase-8547_v1.patch, hbase-8547_v2-0.94-reduced.patch, hbase-8547_v2-addendum2+3-0.94.patch, hbase-8547_v2-addendum2.patch, hbase-8547_v2-addendum2.patch, hbase-8547_v2-addendum3.patch, hbase-8547_v2-trunk.patch In one test, one of the region servers received the following on 0.94. Note HalfStoreFileReader in the stack trace. I think the root cause is that after the region is split, the mid point can be in the middle of the block (for store files that the mid point is not chosen from). Each half store file tries to load the half block and put it in the block cache. Since IdLock is instantiated per store file reader, they do not share the same IdLock instance, thus does not lock against each other effectively. {code} 2013-05-12 01:30:37,733 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:· java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:279) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:353) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:237) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3829) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3896) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3778) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3770) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2643) at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308) at
[jira] [Updated] (HBASE-10324) refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model
[ https://issues.apache.org/jira/browse/HBASE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10324: --- Fix Version/s: 0.99.0 0.98.0 refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model - Key: HBASE-10324 URL: https://issues.apache.org/jira/browse/HBASE-10324 Project: HBase Issue Type: Improvement Components: Client, regionserver Reporter: Feng Honghua Assignee: Feng Honghua Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10324-trunk_v0.patch, HBASE-10324-trunk_v1.patch, HBASE-10324-trunk_v2.patch By the new write thread model introduced by [HBASE-8755|https://issues.apache.org/jira/browse/HBASE-8755], some deferred-log-flush/Durability API/code/names should be change accordingly: 1. no timer-triggered deferred-log-flush since flush is always done by async threads, so configuration 'hbase.regionserver.optionallogflushinterval' is no longer needed 2. the async writer-syncer-notifier threads will always be triggered implicitly, this semantic is that it always holds that 'hbase.regionserver.optionallogflushinterval' 0, so deferredLogSyncDisabled in HRegion.java which affects durability behavior should always be false 3. what HTableDescriptor.isDeferredLogFlush really means is the write can return without waiting for the sync is done, so the interface name should be changed to isAsyncLogFlush/setAsyncLogFlush to reflect their real meaning -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869833#comment-13869833 ] Ted Yu commented on HBASE-10329: With this fix, what should be done in the following catch block (line 1250) ? {code} } catch (Exception e) { LOG.error(UNEXPECTED, e); {code} I assume we won't hit the above anymore. Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Affects Versions: 0.98.0 Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter pendingWrites, and then waits for all items(= 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 as a whole) 5. t5: rollWriter now can close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter sets writer to the newly rolled Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it also uses writer. This is the same reason as why null-writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user gets successful write response but can't read out the writes after getting the successful write response, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block, otherwise it can't be selected by AsyncWriter to do sync -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869838#comment-13869838 ] stack commented on HBASE-10329: --- Thanks [~fenghh]. Looks great. Thanks for persisting and fixing my hack. bq. and stack fixed this by adding 'if (writer != null)' to protect the sync operation The check for nulll writer is actually an old 'problem' done in a few places about the code IIRC so kudos digging in. Over the w/e I was working on my HBASE-10156. Long story short, I ran into the same issue. I need to hold the writer thread while the log is rolled out from under it only I can't hold the writer thread at any arbitrary point; I have to hold the writer when it attains the highest outstanding sync point. Only then I can roll the log (patch coming soon). Having this issue made me wonder how the current implementation does this dance. This issue seems to indicate it didn't. Good on you. Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Affects Versions: 0.98.0 Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter pendingWrites, and then waits for all items(= 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 as a whole) 5. t5: rollWriter now can close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter sets writer to the newly rolled Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it also uses writer. This is the same reason as why null-writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user gets successful write response but can't read out the writes after getting the successful write response, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block, otherwise it can't be selected by AsyncWriter to do sync -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10304) Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
[ https://issues.apache.org/jira/browse/HBASE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869835#comment-13869835 ] Jimmy Xiang commented on HBASE-10304: - Makes sense. bin/hbase script doesn't accept command jar. It may need some tweak to work. Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString Key: HBASE-10304 URL: https://issues.apache.org/jira/browse/HBASE-10304 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.98.0, 0.96.1.1 Reporter: stack Priority: Blocker Fix For: 0.98.0 Attachments: hbase-10304_not_tested.patch, jobjar.xml (Jimmy has been working on this one internally. I'm just the messenger raising this critical issue upstream). So, if you make job jar and bundle up hbase inside in it because you want to access hbase from your mapreduce task, the deploy of the job jar to the cluster fails with: {code} 14/01/05 08:59:19 INFO Configuration.deprecation: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 14/01/05 08:59:19 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum Exception in thread main java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:792) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} So, ZCLBS is a hack. This class is in the hbase-protocol module. It is in the com.google.protobuf package. All is well and good usually. But when we make a job jar and bundle up hbase inside it, our 'trick' breaks. RunJar makes a new class loader to run the job jar. This URLCLassLoader 'attaches' all the jars and classes that are in jobjar so they can be found when it does to do a lookup only Classloaders work by always delegating to their parent first (unless you are a WAR file in a container where delegation is 'off' for the most part) and in this case, the parent classloader will have access to a pb jar since pb is in the hadoop CLASSPATH. So, the parent loads the pb classes. We then load ZCLBS only this is done in the claslsloader made by RunJar; ZKCLBS has a different classloader from its superclass and we get the above IllegalAccessError. Now (Jimmy's work comes in here), this can't be fixed by reflection -- you can't setAccess on a 'Class' -- and though
[jira] [Updated] (HBASE-10315) Canary shouldn't exit with 3 if there is no master running.
[ https://issues.apache.org/jira/browse/HBASE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-10315: -- Attachment: HBASE-10315-1.patch Forgot to attach the newer patch. Canary shouldn't exit with 3 if there is no master running. --- Key: HBASE-10315 URL: https://issues.apache.org/jira/browse/HBASE-10315 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.98.0, 0.96.1.1 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-10315-0.patch, HBASE-10315-1.patch It's possible to timeout(when timeout is below the number of retires to master) before even initializing if there is no master up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10315) Canary shouldn't exit with 3 if there is no master running.
[ https://issues.apache.org/jira/browse/HBASE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-10315: -- Attachment: HBASE-10315-0.patch Here's a patch that exits with init error code if the canary doesn't initialize before the timeout. Canary shouldn't exit with 3 if there is no master running. --- Key: HBASE-10315 URL: https://issues.apache.org/jira/browse/HBASE-10315 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.98.0, 0.96.1.1 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-10315-0.patch It's possible to timeout(when timeout is below the number of retires to master) before even initializing if there is no master up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869854#comment-13869854 ] Andrew Purtell commented on HBASE-10321: Thanks for the fix Anoop! CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10331) Insure security tests use SecureTestUtil methods for grants
[ https://issues.apache.org/jira/browse/HBASE-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10331: --- Status: Patch Available (was: Open) Insure security tests use SecureTestUtil methods for grants --- Key: HBASE-10331 URL: https://issues.apache.org/jira/browse/HBASE-10331 Project: HBase Issue Type: Improvement Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10331.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10331) Insure security tests use SecureTestUtil methods for grants
[ https://issues.apache.org/jira/browse/HBASE-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10331: --- Description: SecureTestUtil methods for grants and revokes wait for consistent AccessController state before proceeding, eliminating a source of race conditions in security unit tests. Insure security tests use SecureTestUtil methods for grants --- Key: HBASE-10331 URL: https://issues.apache.org/jira/browse/HBASE-10331 Project: HBase Issue Type: Improvement Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10331.patch SecureTestUtil methods for grants and revokes wait for consistent AccessController state before proceeding, eliminating a source of race conditions in security unit tests. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10331) Insure security tests use SecureTestUtil methods for grants
[ https://issues.apache.org/jira/browse/HBASE-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10331: --- Attachment: 10331.patch Passes all o.a.h.h.security.*.* tests twice. Insure security tests use SecureTestUtil methods for grants --- Key: HBASE-10331 URL: https://issues.apache.org/jira/browse/HBASE-10331 Project: HBase Issue Type: Improvement Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10331.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6581) Build with hadoop.profile=3.0
[ https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869875#comment-13869875 ] Andrew Purtell commented on HBASE-6581: --- Does this issue confuse issues with Hadoop 3.0 and JDK 8? bq. Some methods take 4 times more with the usage of the Method object. Strangely, TestHLog takes the same time - I will write a small blog post with more details. Consider elaborating a bit here. Build with hadoop.profile=3.0 - Key: HBASE-6581 URL: https://issues.apache.org/jira/browse/HBASE-6581 Project: HBase Issue Type: Bug Reporter: Eric Charles Assignee: Eric Charles Fix For: 0.98.1 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, HBASE-6581-5.patch, HBASE-6581-6.patch, HBASE-6581-7.patch, HBASE-6581.diff, HBASE-6581.diff Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT instead of 3.0.0-SNAPSHOT in hbase-common). I can provide a patch that would move most of hadoop dependencies in their respective profiles and will define the correct hadoop deps in the 3.0 profile. Please tell me if that's ok to go this way. Thx, Eric [1] $ mvn clean install -Dhadoop.profile=3.0 [INFO] Scanning for projects... [ERROR] The build could not read 3 projects - [Help 1] [ERROR] [ERROR] The project org.apache.hbase:hbase-server:0.95-SNAPSHOT (/d/hbase.svn/hbase-server/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21 [ERROR] [ERROR] The project org.apache.hbase:hbase-common:0.95-SNAPSHOT (/d/hbase.svn/hbase-common/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21 [ERROR] [ERROR] The project org.apache.hbase:hbase-it:0.95-SNAPSHOT (/d/hbase.svn/hbase-it/pom.xml) has 3 errors [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21 [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21 [ERROR] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10315) Canary shouldn't exit with 3 if there is no master running.
[ https://issues.apache.org/jira/browse/HBASE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-10315: -- Status: Patch Available (was: Open) Canary shouldn't exit with 3 if there is no master running. --- Key: HBASE-10315 URL: https://issues.apache.org/jira/browse/HBASE-10315 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.96.1.1, 0.98.0 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-10315-0.patch, HBASE-10315-1.patch It's possible to timeout(when timeout is below the number of retires to master) before even initializing if there is no master up. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10123) Change default ports; move them out of linux ephemeral port range
[ https://issues.apache.org/jira/browse/HBASE-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869893#comment-13869893 ] Jonathan Hsieh commented on HBASE-10123: tl;dr Based on stack's link I'm going to move 60xxx ports to 16xxx ports. Stack's link basically states the comment ephemeral port ranges: BSD - 1-1023 reserved. 1024-4999 ephemeral. Others feel 49152-65535 are ephemeral AIX - 32768-65535 ephemeral. HPUX - 49152-65535 ephemeral. Linux 2.2 - 1024-4999 ephemeral. Linux 2.4 - 32768-61000 ephmeral. openBSD - 32786-49151 or 49152-65535 ephemeral solaris - 32768-65535 ephemeral. tru64 unix - 1024-4999 ephemeral. windows 2k8- 49152-65535 basically means we are safe anywhere between 5000-32768. Looking at my /etc/services (ubuntu 10.04], big blocks that seem untouched include 12xxx, 14xxx, 16xxx, 18-19xxx, 21xxx, 23xxx, 26xxx, 28xxx-32768 Change default ports; move them out of linux ephemeral port range - Key: HBASE-10123 URL: https://issues.apache.org/jira/browse/HBASE-10123 Project: HBase Issue Type: Bug Affects Versions: 0.96.1.1 Reporter: stack Priority: Critical Fix For: 0.98.0 Our defaults clash w/ the range linux assigns itself for creating come-and-go ephemeral ports; likely in our history we've clashed w/ a random, short-lived process. While easy to change the defaults, we should just ship w/ defaults that make sense. We could host ourselves up into the 7 or 8k range. See http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869915#comment-13869915 ] Samir Ahmic commented on HBASE-7386: Thanks for review [~nkeywal]. I agree about 'PROCESS_STATE_UNKNOWN', i checked it in supervisor source code and it is look like that is used for actions if supervisor is unable to determine state of process. I will remove it from event listener since it can cause issues. I was planing to make mail notification optional even to create separate event listener that will handle email notifications. '/bin/mail' is most simple solution and following that example folks could develop there own solution. What do you think how this should be handled ? bq. Do we have to use python? According to documentation: Event listener can be written in any language supported by the platform you’re using to run supervisor. There is special library support for Python in the form of a supervisor.childutils module, which makes creating event listeners in Python slightly easier than in other languages. Any suggestions what should we use instead of python ? Java ? When we complete this work it should be documented probably under 15. Apache HBase Operational Management ? Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869917#comment-13869917 ] Hudson commented on HBASE-10321: SUCCESS: Integrated in HBase-0.98 #73 (See [https://builds.apache.org/job/HBase-0.98/73/]) HBASE-10321 CellCodec has broken the 96 client to 98 server compatibility (anoopsamjohn: rev 1557780) * /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodec.java * /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodecV2.java * /hbase/branches/0.98/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodec.java * /hbase/branches/0.98/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodecV2.java CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869919#comment-13869919 ] Hudson commented on HBASE-10321: SUCCESS: Integrated in HBase-TRUNK #4810 (See [https://builds.apache.org/job/HBase-TRUNK/4810/]) HBASE-10321 CellCodec has broken the 96 client to 98 server compatibility (anoopsamjohn: rev 1557781) * /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodec.java * /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodecV2.java * /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodec.java * /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodecV2.java CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869920#comment-13869920 ] Hudson commented on HBASE-10326: SUCCESS: Integrated in HBase-TRUNK #4810 (See [https://builds.apache.org/job/HBase-TRUNK/4810/]) HBASE-10326 Super user should be able scan all the cells irrespective of the visibility labels(Ram) (anoopsamjohn: rev 1557792) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithACL.java Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869918#comment-13869918 ] Hudson commented on HBASE-10326: SUCCESS: Integrated in HBase-0.98 #73 (See [https://builds.apache.org/job/HBase-0.98/73/]) HBASE-10326 Super user should be able scan all the cells irrespective of the visibility labels(Ram) (anoopsamjohn: rev 1557791) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithACL.java Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10332) Missing .regioninfo file during daughter open processing
[ https://issues.apache.org/jira/browse/HBASE-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi reassigned HBASE-10332: --- Assignee: Matteo Bertozzi Missing .regioninfo file during daughter open processing Key: HBASE-10332 URL: https://issues.apache.org/jira/browse/HBASE-10332 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Matteo Bertozzi Under cluster stress testing, there are a fair amount of warnings like this: {noformat} 2014-01-12 04:52:29,183 WARN [test-1,8120,1389467616661-daughterOpener=490a58c14b14a59e8d303d310684f0b0] regionserver.HRegionFileSystem: .regioninfo file not found for region: 490a58c14b14a59e8d303d310684f0b0 {noformat} This is from HRegionFileSystem#checkRegionInfoOnFilesystem, which catches a FileNotFoundException in this case and calls writeRegionInfoOnFilesystem to fix up the issue. Is this a bug in splitting? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10332) Missing .regioninfo file during daughter open processing
[ https://issues.apache.org/jira/browse/HBASE-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869935#comment-13869935 ] Matteo Bertozzi commented on HBASE-10332: - HRegion.createDaughterRegionFromSplits() uses HRegion.newHRegion() instead of createHRegion() so the .regioninfo file is not created on daughter creation but on daughter open by the checkRegionInfoOnFileSystem.. This shouldn't be a problem, but let me see if I can change the code to use the create, or at least write the .regioninfo on creation Missing .regioninfo file during daughter open processing Key: HBASE-10332 URL: https://issues.apache.org/jira/browse/HBASE-10332 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Under cluster stress testing, there are a fair amount of warnings like this: {noformat} 2014-01-12 04:52:29,183 WARN [test-1,8120,1389467616661-daughterOpener=490a58c14b14a59e8d303d310684f0b0] regionserver.HRegionFileSystem: .regioninfo file not found for region: 490a58c14b14a59e8d303d310684f0b0 {noformat} This is from HRegionFileSystem#checkRegionInfoOnFilesystem, which catches a FileNotFoundException in this case and calls writeRegionInfoOnFilesystem to fix up the issue. Is this a bug in splitting? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869943#comment-13869943 ] Ted Yu commented on HBASE-10329: Integrated to trunk. Patch for 0.98 coming. Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Affects Versions: 0.98.0 Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter pendingWrites, and then waits for all items(= 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 as a whole) 5. t5: rollWriter now can close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter sets writer to the newly rolled Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it also uses writer. This is the same reason as why null-writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user gets successful write response but can't read out the writes after getting the successful write response, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block, otherwise it can't be selected by AsyncWriter to do sync -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility
[ https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869944#comment-13869944 ] Hudson commented on HBASE-10321: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #68 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/68/]) HBASE-10321 CellCodec has broken the 96 client to 98 server compatibility (anoopsamjohn: rev 1557780) * /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodec.java * /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodecV2.java * /hbase/branches/0.98/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodec.java * /hbase/branches/0.98/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodecV2.java CellCodec has broken the 96 client to 98 server compatibility - Key: HBASE-10321 URL: https://issues.apache.org/jira/browse/HBASE-10321 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Blocker Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch The write/read tags added in CellCodec has broken the 96 client to 98 server compatibility (and 98 client to 96 server) When 96 client CellCodec writes cell, it won't write tags part at all. But the server expects a tag part, at least a 0 tag length. This tag length read will make a read of some bytes from next cell! I suggest we can remove the tag part from CellCodec. This codec is not used by default and I don't think some one will change to CellCodec from the default KVCodec now. .. This makes tags not supported via CellCodec..Tag support can be added to CellCodec once we have Connection negotiation in place (?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels
[ https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869945#comment-13869945 ] Hudson commented on HBASE-10326: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #68 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/68/]) HBASE-10326 Super user should be able scan all the cells irrespective of the visibility labels(Ram) (anoopsamjohn: rev 1557791) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithACL.java Super user should be able scan all the cells irrespective of the visibility labels -- Key: HBASE-10326 URL: https://issues.apache.org/jira/browse/HBASE-10326 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Labels: security Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10326.patch, HBASE-10326_1.patch This issue is in lieu with HBASE-10322. In case of export tool, when the cells with visibility labels are exported using a super user we should be able to export the data. But with the current implementation, the super user would also be able to view cells that has visibility labels associated with the superuser. The idea of HBASE-10322 is to strip out tags based on user and if so this change is necessary for export tool to work with Visibility. ACL already has a concept of global admins. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10329: --- Status: Open (was: Patch Available) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Affects Versions: 0.98.0 Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter pendingWrites, and then waits for all items(= 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 as a whole) 5. t5: rollWriter now can close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter sets writer to the newly rolled Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it also uses writer. This is the same reason as why null-writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user gets successful write response but can't read out the writes after getting the successful write response, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block, otherwise it can't be selected by AsyncWriter to do sync -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10331) Insure security tests use SecureTestUtil methods for grants
[ https://issues.apache.org/jira/browse/HBASE-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869975#comment-13869975 ] Hadoop QA commented on HBASE-10331: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622681/10331.patch against trunk revision . ATTACHMENT ID: 12622681 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8405//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8405//console This message is automatically generated. Insure security tests use SecureTestUtil methods for grants --- Key: HBASE-10331 URL: https://issues.apache.org/jira/browse/HBASE-10331 Project: HBase Issue Type: Improvement Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10331.patch SecureTestUtil methods for grants and revokes wait for consistent AccessController state before proceeding, eliminating a source of race conditions in security unit tests. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869971#comment-13869971 ] Ted Yu commented on HBASE-10329: Integrated to 0.98 as well. Thanks for the patch, Honghua. Will resolve this after seeing green builds. Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Affects Versions: 0.98.0 Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10329-0.98.txt, HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter pendingWrites, and then waits for all items(= 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 as a whole) 5. t5: rollWriter now can close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter sets writer to the newly rolled Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it also uses writer. This is the same reason as why null-writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user gets successful write response but can't read out the writes after getting the successful write response, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block, otherwise it can't be selected by AsyncWriter to do sync -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer
[ https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10329: --- Attachment: 10329-0.98.txt Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer -- Key: HBASE-10329 URL: https://issues.apache.org/jira/browse/HBASE-10329 Project: HBase Issue Type: Bug Components: regionserver, wal Affects Versions: 0.98.0 Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 10329-0.98.txt, HBASE-10329-trunk_v0.patch Last month after I introduced multiple AsyncSyncer threads to improve the throughput for lower number client write threads, [~stack] encountered a NPE while doing the test where null-writer occurs in AsyncSyncer when doing sync. Since we have run many times test in cluster to verify the throughput improvement, and never encountered such NPE, it really confused me. (and [~stack] fixed this by adding 'if (writer != null)' to protect the sync operation) These days from time to time I wondered why the writer can be null in AsyncSyncer and whether it's safe to fix it by just adding a null checking before doing sync, as [~stack] did. After some digging, I find out the case where AsyncSyncer can encounter null-writer, it is as below: 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with writtenTxid==100 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with writtenTxid==200 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes from client writes to enter pendingWrites, and then waits for all items(= 200) in pendingWrites to append and finally sync to hdfs 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 as a whole) 5. t5: rollWriter now can close writer, set writer=null... 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before rollWriter sets writer to the newly rolled Writer We can see: 1. the null writer is possible only after there are multiple AsyncSyncer threads, that's why we never encountered it before introducing multiple AsyncSyncer threads. 2. since rollWriter can set writer=null only after all items of pendingWrites sync to hdfs, and AsyncWriter is in the critical path of this task and there is only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, that's why we never encounter null writer in AsyncWriter though it also uses writer. This is the same reason as why null-writer never occurs when there is a single AsyncSyncer thread. And we should treat differently when writer == null in AsyncSyncer: 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care about have already been synced by other AsyncSyncer, we can safely ignore sync(as [~stack] does here); 2. if txidToSync syncedTillHere, we need fail all the writes with txid = txidToSync to avoid data loss: user gets successful write response but can't read out the writes after getting the successful write response, from user's perspective this is data loss (according to above analysis, such case should not occur, but we still should add such defensive treatment to prevent data loss if it really occurs, such as by some bug introduced later) also fix the bug where isSyncing needs to reset to false when writer.sync encounters IOException: AsyncSyncer swallows such exception by failing all writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do later sync, its isSyncing needs to be reset to false in the IOException handling block, otherwise it can't be selected by AsyncWriter to do sync -- This message was sent by Atlassian JIRA (v6.1.5#6160)