[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869350#comment-13869350
 ] 

Andrew Purtell commented on HBASE-10321:


Another alternative is to also make CellCodec incapable of handling tags but 
backwards compatible also, and add *another* codec which can handle tags. Call 
it CellCodecV2 or whatever. 

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869349#comment-13869349
 ] 

Andrew Purtell commented on HBASE-10321:


If KVCodec is the default and does not have a backwards compatability problem, 
then doesn't that solve the issue?

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HBASE-10327) remove(K, V) of type PoolMapK,V has the same erasure

2014-01-13 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-10327.


Resolution: Duplicate

 remove(K, V) of type PoolMapK,V has the same erasure
 --

 Key: HBASE-10327
 URL: https://issues.apache.org/jira/browse/HBASE-10327
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: Eric Charles
 Attachments: HBASE-10327.patch


 I keep getting red cross in my eclipse, whatever the jdk (jdk6, jdk7, jdk8)
 Name clash: The method remove(K, V) of type PoolMapK,V has the same erasure 
 as remove(Object, Object) of type MapK,V but does not override it
 maybe related to HBASE-10030
 The solution I have is simply removing the deprecated method, and everything 
 is fine. I am not sure of the backwards compatibility here.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10322) Strip tags from KV while sending back to client on reads

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869356#comment-13869356
 ] 

Andrew Purtell commented on HBASE-10322:


+1

 Strip tags from KV while sending back to client on reads
 

 Key: HBASE-10322
 URL: https://issues.apache.org/jira/browse/HBASE-10322
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10322.patch


 Right now we have some inconsistency wrt sending back tags on read. We do 
 this in scan when using Java client(Codec based cell block encoding). But 
 during a Get operation or when a pure PB based Scan comes we are not sending 
 back the tags.  So any of the below fix we have to do
 1. Send back tags in missing cases also. But sending back visibility 
 expression/ cell ACL is not correct.
 2. Don't send back tags in any case. This will a problem when a tool like 
 ExportTool use the scan to export the table data. We will miss exporting the 
 cell visibility/ACL.
 3. Send back tags based on some condition. It has to be per scan basis. 
 Simplest way is pass some kind of attribute in Scan which says whether to 
 send back tags or not. But believing some thing what scan specifies might not 
 be correct IMO. Then comes the way of checking the user who is doing the 
 scan. When a HBase super user doing the scan then only send back tags. So 
 when a case comes like Export Tool's the execution should happen from a super 
 user.
 So IMO we should go with #3.
 Patch coming soon.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers

2014-01-13 Thread chendihao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869360#comment-13869360
 ] 

chendihao commented on HBASE-10274:
---

Thank [~lhofhansl] to resolve HBASE-10306 and please commit this by the way. 

 MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
 ---

 Key: HBASE-10274
 URL: https://issues.apache.org/jira/browse/HBASE-10274
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao
Assignee: chendihao
Priority: Minor
 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, 
 HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch


 HBASE-6820 points out the problem but not fix completely.
 killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will 
 shutdown the ZooKeeperServer and need to close ZKDatabase as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-10326:
---

Status: Open  (was: Patch Available)

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-10326:
---

Status: Patch Available  (was: Open)

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-10326:
---

Attachment: HBASE-10326_1.patch

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869368#comment-13869368
 ] 

Anoop Sam John commented on HBASE-10321:


bq.and add another codec which can handle tags
Looks good to me. Will make a patch which includes CellCodecV2.

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10282) We can't assure that the first ZK server is active server in MiniZooKeeperCluster

2014-01-13 Thread chendihao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869370#comment-13869370
 ] 

chendihao commented on HBASE-10282:
---

The function {{killCurrentActiveZooKeeperServer()}} and 
{{killOneBackupZooKeeperServer()}} are nonsense because of this. I think 
[~liyin] treated the first zk server as the leader but we can't make sure of 
that. So should we rename {{activeZKServerIndex}} into {{firstZKServerIndex}} 
and combining these two functions into {{killFirstZooKeeperServer()}}(hard to 
know which one is actual leader)?

Need more people to discuss it. [~enis] [~stack]

 We can't assure that the first ZK server is active server in 
 MiniZooKeeperCluster
 -

 Key: HBASE-10282
 URL: https://issues.apache.org/jira/browse/HBASE-10282
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao
Priority: Minor

 Thanks to HBASE-3052, we're able to run multiple zk servers in minicluster. 
 However, It's confusing to keep the variable activeZKServerIndex as zero and 
 assure the first zk server is always the active one. I think returning the 
 first sever's client port is for testing and it seems that we can directly 
 return the first item of the list. Anyway, the concept of active here is 
 not the same as zk's. 
 It's confusing when I read the code so I think we should fix it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869378#comment-13869378
 ] 

ramkrishna.s.vasudevan commented on HBASE-10321:


+1 on CellCodecV2.

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10321:
---

Attachment: HBASE-10321_V2.patch

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869386#comment-13869386
 ] 

ramkrishna.s.vasudevan commented on HBASE-10321:


+1 on patch. LGTM.

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster

2014-01-13 Thread chendihao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869396#comment-13869396
 ] 

chendihao commented on HBASE-10283:
---

There're two solutions for this. The first one is allowing to set different zk 
ports for HBase(generic but contrary to original design). And the next one is 
adding extra code in ZKConf to support multiple ports for MiniZooKeeperCluster. 
I prefer the last one and try to reduce the change of code. 

MiniZooKeeperCluster can't be used for zk failover test before it's fixed. Can 
[~enis] help to review this?

 Client can't connect with all the running zk servers in MiniZooKeeperCluster
 

 Key: HBASE-10283
 URL: https://issues.apache.org/jira/browse/HBASE-10283
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao

 Refer to HBASE-3052, multiple zk servers can run together in minicluster. The 
 problem is that client can only connect with the first zk server and if you 
 kill the first one, it fails to access the cluster even though other zk 
 servers are serving.
 It's easy to repro.  Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call 
 `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you 
 construct the zk client, it can't connect with the zk cluster for any way. 
 Here is the simple log you can refer.
 {noformat}
 2014-01-03 12:06:58,625 INFO  [main] zookeeper.MiniZooKeeperCluster(194): 
 Started MiniZK Cluster and connect 1 ZK server on client port: 55227
 ..
 2014-01-03 12:06:59,134 INFO  [main] zookeeper.MiniZooKeeperCluster(264): 
 Kill the current active ZK servers in the cluster on client port: 55227
 2014-01-03 12:06:59,134 INFO  [main] zookeeper.MiniZooKeeperCluster(272): 
 Activate a backup zk server in the cluster on client port: 55228
 2014-01-03 12:06:59,366 INFO  [main-EventThread] zookeeper.ZooKeeper(434): 
 Initiating client connection, connectString=localhost:55227 
 sessionTimeout=3000 
 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118
 (then it throws exceptions..)
 {noformat}
 The log is kind of problematic because it always show Started MiniZK Cluster 
 and connect 1 ZK server but actually there're three zk servers.
 Looking deeply we find that the client is still trying to connect with the 
 dead zk server's port. When I print out the zkQuorum it used, only the first 
 zk server's hostport is there and it will not change no matter you kill the 
 server or not. The reason for this is in ZKConfig which will convert HBase 
 settings into zk's. MiniZooKeeperCluster create three servers with the same 
 host name, localhost, and different ports. But HBase self force to use the 
 same port for each zk server and ZKConfig will ignore the other two servers 
 which have the same host name.
 MiniZooKeeperCluster works improperly before we fix this. The bug is not 
 found because we never test whether HBase works or not if we kill the zk 
 active or backup servers in ut. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10030) [JDK8] Erasure of PoolMap#remove(K,V) conflicts with superclass method

2014-01-13 Thread Eric Charles (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869394#comment-13869394
 ] 

Eric Charles commented on HBASE-10030:
--

It works fine with jdk8 via mvn cli, but gives compilation issue in eclipse 
(not the first time eclipse disagrees with mvn-cli).
Only me?

 [JDK8] Erasure of PoolMap#remove(K,V) conflicts with superclass method
 --

 Key: HBASE-10030
 URL: https://issues.apache.org/jira/browse/HBASE-10030
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Trivial
 Fix For: 0.98.0

 Attachments: 10030.patch


 On JDK 8, the erasure of PoolMap#remove(K,V) conflicts with superclass method 
 remove(Object,Object).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869398#comment-13869398
 ] 

Anoop Sam John commented on HBASE-10321:


Thanks all for reviews. Will commit tonight IST unless objections.

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869404#comment-13869404
 ] 

Hadoop QA commented on HBASE-10326:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622599/HBASE-10326_1.patch
  against trunk revision .
  ATTACHMENT ID: 12622599

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8401//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8401//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8401//console

This message is automatically generated.

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay

2014-01-13 Thread Feng Honghua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-10227:
-

Attachment: HBASE-10227-trunk_v0.patch

the fix is as below:
1. persist mvcc in HLog (in WALEdit)
2. never set KeyValue's mvcc to 0
3. always(not conditionally) include mvcc in HFile
4. reinitialize region mvcc after replaying split HLog files to include the 
greater ones in the new stores resulted from replaying/flushing split HLog 
files -- to correctly recover the region's mvcc

Note to step 4: since replaying split HLog files need access mvcc, so we can't 
intialize mvcc after replaying split HLog files, reinitializing it to the final 
correct one is ok after replaying is done. An alternative fix is to add and use 
a new internalFlushcache method for replaying split HLog files which doesn't 
access mvcc(it's ok since when replaying split HLog files, it's impossible 
there is in-progress transaction/write not committed to HLog--no write to HLog 
during replaying split HLog files)

 When a region is opened, its mvcc isn't correctly recovered when there are 
 split hlogs to replay
 

 Key: HBASE-10227
 URL: https://issues.apache.org/jira/browse/HBASE-10227
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10227-trunk_v0.patch


 When opening a region, all stores are examined to get the max MemstoreTS and 
 it's used as the initial mvcc for the region, and then split hlogs are 
 replayed. In fact the edits in split hlogs have kvs with greater mvcc than 
 all MemstoreTS in all store files, but replaying them don't increment the 
 mvcc according at all. From an overall perspective this mvcc recovering is 
 'logically' incorrect/incomplete.
 Why currently it doesn't incur problem is because no active scanners exists 
 and no new scanners can be created before the region opening completes, so 
 the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely 
 set to zero. They are just treated as kvs put 'earlier' than the ones in 
 HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less 
 than the ones with non-zero mvcc, but in fact they are put 'later'), and 
 without any incorrect impact just because during region opening there are no 
 active scanners existing / created.
 This bug is just in 'logic' sense for the time being, but if later on we need 
 to survive mvcc in the region's whole logic lifecycle(across regionservers) 
 and never set them to zero, this bug needs to be fixed first.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869408#comment-13869408
 ] 

Anoop Sam John commented on HBASE-10326:


Patch looks good Ram.
Pls correct the white spaces introduced after checkIfScanOrGetFromSuperUser 
private method.
{code}
+HTable acl = new HTable(conf, AccessControlLists.ACL_TABLE_NAME);
+try {
+  BlockingRpcChannel service = acl.coprocessorService(tableName.getName());
+  AccessControlService.BlockingInterface protocol = AccessControlService
+  .newBlockingStub(service);
+  ProtobufUtil.grant(protocol, NORMAL_USER2.getShortName(), tableName, 
null, null,
+  Permission.Action.READ);
+} finally {
+  acl.close();
+}
{code}
Instead can use AccessControlClient#grant ?   This code is repeated in tests..

Thanks for the patch.


 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869426#comment-13869426
 ] 

Hadoop QA commented on HBASE-10321:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622602/HBASE-10321_V2.patch
  against trunk revision .
  ATTACHMENT ID: 12622602

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8402//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8402//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8402//console

This message is automatically generated.

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10325) Unknown option or illegal argument:-XX:OnOutOfMemoryError=kill -9 %p

2014-01-13 Thread chillon_m (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869445#comment-13869445
 ] 

chillon_m commented on HBASE-10325:
---

jrockit

 Unknown option or illegal argument:-XX:OnOutOfMemoryError=kill -9 %p
 

 Key: HBASE-10325
 URL: https://issues.apache.org/jira/browse/HBASE-10325
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.1.1
Reporter: chillon_m

 Unknown option or illegal argument: -XX:OnOutOfMemoryError=kill -9 %p. 
 Please check for incorrect spelling or review documentation of startup 
 options.
 Could not create the Java virtual machine.
 starting master, logging to 
 /home/hadoop/hbase-0.96.1.1-hadoop2/logs/hbase-hadoop-master-namenode0.hadoop.out
 Unknown option or illegal argument: -XX:OnOutOfMemoryError=kill -9 %p. 
 Please check for incorrect spelling or review documentation of startup options



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay

2014-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10227:
---

Status: Patch Available  (was: Open)

 When a region is opened, its mvcc isn't correctly recovered when there are 
 split hlogs to replay
 

 Key: HBASE-10227
 URL: https://issues.apache.org/jira/browse/HBASE-10227
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10227-trunk_v0.patch


 When opening a region, all stores are examined to get the max MemstoreTS and 
 it's used as the initial mvcc for the region, and then split hlogs are 
 replayed. In fact the edits in split hlogs have kvs with greater mvcc than 
 all MemstoreTS in all store files, but replaying them don't increment the 
 mvcc according at all. From an overall perspective this mvcc recovering is 
 'logically' incorrect/incomplete.
 Why currently it doesn't incur problem is because no active scanners exists 
 and no new scanners can be created before the region opening completes, so 
 the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely 
 set to zero. They are just treated as kvs put 'earlier' than the ones in 
 HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less 
 than the ones with non-zero mvcc, but in fact they are put 'later'), and 
 without any incorrect impact just because during region opening there are no 
 active scanners existing / created.
 This bug is just in 'logic' sense for the time being, but if later on we need 
 to survive mvcc in the region's whole logic lifecycle(across regionservers) 
 and never set them to zero, this bug needs to be fixed first.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Feng Honghua (JIRA)
Feng Honghua created HBASE-10329:


 Summary: Fail the writes rather than proceeding silently to 
prevent data loss when AsyncSyncer encounters null writer and its writes aren't 
synced by other Asyncer
 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Reporter: Feng Honghua
Assignee: Feng Honghua


Last month after I introduced multiple AsyncSyncer threads to improve the 
throughput for lower number clients, [~stack] encountered a NPE while doing the 
test where null writer occues in AsyncSyncer when doing sync. Since we have run 
many times test in cluster to verify the throughput improvement, and never 
encountered such NPE, it really confused me. (and [~stack] fix this by adding a 
'if (writer != null)' to protect the sync operation)

I always wonder why the write can be null in AsyncSyncer and whether it's safe 
to fix by just adding a null check before doing sync, as [~stack] did. After 
some dig and analysis, I find out the case where AsyncSyncer can encounter null 
writer, it is as below:
1. t1: AsyncWriter append writes to hdfs, triggers AsyncSyncer 1 with 
writtenTxid==100
2. t2: AsyncWriter append writes to hdfs, triggers AsyncSyncer 2 with 
writtenTxid==200
3. t3: rollWriter starts, it grabs the updateLock which prevents further writes 
entering pendingWrites, and then waits for all items(till 200) in pendingWrites 
to append and finally sync to hdfs
4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200, it also help sync =100 
as a whole
5. t5: rollWriter close writer, set writer=null...
6. t6: AsyncSyncer 1 starts to do sync and finds writer is null... before 
rollWriter set writer to the newly created Writer

We can see:
1. the null writer is possible only after there are multiple AsyncSyncer 
threads, that's why we never encountered it before introducing multiple 
AsyncSyncer threads.
2. since rollWriter can set writer=null only after all items of pendingWrites 
sync to hdfs, and AsyncWriter is in the critical path of this task and there is 
only one AsyncWriter thread, so AsyncWriter can't encounter null writer, that's 
why we never encounter null writer in AsyncWriter though it uses writer as 
well. This is the same reason as why null writer never occurs when there is a 
single AsyncSyncer thread.

And we should treat differently when writer == null in AsyncSyncer:
1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care 
about have already by synced by other AsyncSyncer, we can safely ignore sync(as 
[~stack] does here);
2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
txidToSync to avoid data loss: user get successful write response but can't 
read out the writes, from user's perspective this is data loss (according to 
above analysis, such case should not occur, but we still should add such 
defensive treatment to prevent data loss if it really occurs, such as by some 
bug introduced later)

also fix the bug where isSyncing needs to reset to false when writer.sync 
encounters IOException: AsyncSyncer swallows such exception by failing all 
writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
later sync, its isSyncing needs to be reset to false in the IOException 
handling block



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster

2014-01-13 Thread chendihao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chendihao reassigned HBASE-10283:
-

Assignee: chendihao

 Client can't connect with all the running zk servers in MiniZooKeeperCluster
 

 Key: HBASE-10283
 URL: https://issues.apache.org/jira/browse/HBASE-10283
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao
Assignee: chendihao

 Refer to HBASE-3052, multiple zk servers can run together in minicluster. The 
 problem is that client can only connect with the first zk server and if you 
 kill the first one, it fails to access the cluster even though other zk 
 servers are serving.
 It's easy to repro.  Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call 
 `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you 
 construct the zk client, it can't connect with the zk cluster for any way. 
 Here is the simple log you can refer.
 {noformat}
 2014-01-03 12:06:58,625 INFO  [main] zookeeper.MiniZooKeeperCluster(194): 
 Started MiniZK Cluster and connect 1 ZK server on client port: 55227
 ..
 2014-01-03 12:06:59,134 INFO  [main] zookeeper.MiniZooKeeperCluster(264): 
 Kill the current active ZK servers in the cluster on client port: 55227
 2014-01-03 12:06:59,134 INFO  [main] zookeeper.MiniZooKeeperCluster(272): 
 Activate a backup zk server in the cluster on client port: 55228
 2014-01-03 12:06:59,366 INFO  [main-EventThread] zookeeper.ZooKeeper(434): 
 Initiating client connection, connectString=localhost:55227 
 sessionTimeout=3000 
 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118
 (then it throws exceptions..)
 {noformat}
 The log is kind of problematic because it always show Started MiniZK Cluster 
 and connect 1 ZK server but actually there're three zk servers.
 Looking deeply we find that the client is still trying to connect with the 
 dead zk server's port. When I print out the zkQuorum it used, only the first 
 zk server's hostport is there and it will not change no matter you kill the 
 server or not. The reason for this is in ZKConfig which will convert HBase 
 settings into zk's. MiniZooKeeperCluster create three servers with the same 
 host name, localhost, and different ports. But HBase self force to use the 
 same port for each zk server and ZKConfig will ignore the other two servers 
 which have the same host name.
 MiniZooKeeperCluster works improperly before we fix this. The bug is not 
 found because we never test whether HBase works or not if we kill the zk 
 active or backup servers in ut. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Feng Honghua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-10329:
-

Description: 
Last month after I introduced multiple AsyncSyncer threads to improve the 
throughput for lower number client write threads, [~stack] encountered a NPE 
while doing the test where null writer occues in AsyncSyncer when doing sync. 
Since we have run many times test in cluster to verify the throughput 
improvement, and never encountered such NPE, it really confused me. (and 
[~stack] fix this by adding a 'if (writer != null)' to protect the sync 
operation)

I always wonder why the write can be null in AsyncSyncer and whether it's safe 
to fix by just adding a null check before doing sync, as [~stack] did. After 
some dig and analysis, I find out the case where AsyncSyncer can encounter null 
writer, it is as below:
1. t1: AsyncWriter append writes to hdfs, triggers AsyncSyncer 1 with 
writtenTxid==100
2. t2: AsyncWriter append writes to hdfs, triggers AsyncSyncer 2 with 
writtenTxid==200
3. t3: rollWriter starts, it grabs the updateLock which prevents further writes 
entering pendingWrites, and then waits for all items(till 200) in pendingWrites 
to append and finally sync to hdfs
4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200, it also help sync =100 
as a whole
5. t5: rollWriter close writer, set writer=null...
6. t6: AsyncSyncer 1 starts to do sync and finds writer is null... before 
rollWriter set writer to the newly created Writer

We can see:
1. the null writer is possible only after there are multiple AsyncSyncer 
threads, that's why we never encountered it before introducing multiple 
AsyncSyncer threads.
2. since rollWriter can set writer=null only after all items of pendingWrites 
sync to hdfs, and AsyncWriter is in the critical path of this task and there is 
only one AsyncWriter thread, so AsyncWriter can't encounter null writer, that's 
why we never encounter null writer in AsyncWriter though it uses writer as 
well. This is the same reason as why null writer never occurs when there is a 
single AsyncSyncer thread.

And we should treat differently when writer == null in AsyncSyncer:
1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care 
about have already by synced by other AsyncSyncer, we can safely ignore sync(as 
[~stack] does here);
2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
txidToSync to avoid data loss: user get successful write response but can't 
read out the writes, from user's perspective this is data loss (according to 
above analysis, such case should not occur, but we still should add such 
defensive treatment to prevent data loss if it really occurs, such as by some 
bug introduced later)

also fix the bug where isSyncing needs to reset to false when writer.sync 
encounters IOException: AsyncSyncer swallows such exception by failing all 
writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
later sync, its isSyncing needs to be reset to false in the IOException 
handling block

  was:
Last month after I introduced multiple AsyncSyncer threads to improve the 
throughput for lower number clients, [~stack] encountered a NPE while doing the 
test where null writer occues in AsyncSyncer when doing sync. Since we have run 
many times test in cluster to verify the throughput improvement, and never 
encountered such NPE, it really confused me. (and [~stack] fix this by adding a 
'if (writer != null)' to protect the sync operation)

I always wonder why the write can be null in AsyncSyncer and whether it's safe 
to fix by just adding a null check before doing sync, as [~stack] did. After 
some dig and analysis, I find out the case where AsyncSyncer can encounter null 
writer, it is as below:
1. t1: AsyncWriter append writes to hdfs, triggers AsyncSyncer 1 with 
writtenTxid==100
2. t2: AsyncWriter append writes to hdfs, triggers AsyncSyncer 2 with 
writtenTxid==200
3. t3: rollWriter starts, it grabs the updateLock which prevents further writes 
entering pendingWrites, and then waits for all items(till 200) in pendingWrites 
to append and finally sync to hdfs
4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200, it also help sync =100 
as a whole
5. t5: rollWriter close writer, set writer=null...
6. t6: AsyncSyncer 1 starts to do sync and finds writer is null... before 
rollWriter set writer to the newly created Writer

We can see:
1. the null writer is possible only after there are multiple AsyncSyncer 
threads, that's why we never encountered it before introducing multiple 
AsyncSyncer threads.
2. since rollWriter can set writer=null only after all items of pendingWrites 
sync to hdfs, and AsyncWriter is in the critical path of this task and there is 
only one AsyncWriter thread, so AsyncWriter can't encounter null writer, that's 
why we never encounter null writer in AsyncWriter though it 

[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Feng Honghua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-10329:
-

Attachment: HBASE-10329-trunk_v0.patch

patch is attached and ping [~stack] :-)

 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number clients, [~stack] encountered a NPE while doing 
 the test where null writer occues in AsyncSyncer when doing sync. Since we 
 have run many times test in cluster to verify the throughput improvement, and 
 never encountered such NPE, it really confused me. (and [~stack] fix this by 
 adding a 'if (writer != null)' to protect the sync operation)
 I always wonder why the write can be null in AsyncSyncer and whether it's 
 safe to fix by just adding a null check before doing sync, as [~stack] did. 
 After some dig and analysis, I find out the case where AsyncSyncer can 
 encounter null writer, it is as below:
 1. t1: AsyncWriter append writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter append writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock which prevents further 
 writes entering pendingWrites, and then waits for all items(till 200) in 
 pendingWrites to append and finally sync to hdfs
 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200, it also help sync 
 =100 as a whole
 5. t5: rollWriter close writer, set writer=null...
 6. t6: AsyncSyncer 1 starts to do sync and finds writer is null... before 
 rollWriter set writer to the newly created Writer
 We can see:
 1. the null writer is possible only after there are multiple AsyncSyncer 
 threads, that's why we never encountered it before introducing multiple 
 AsyncSyncer threads.
 2. since rollWriter can set writer=null only after all items of pendingWrites 
 sync to hdfs, and AsyncWriter is in the critical path of this task and there 
 is only one AsyncWriter thread, so AsyncWriter can't encounter null writer, 
 that's why we never encounter null writer in AsyncWriter though it uses 
 writer as well. This is the same reason as why null writer never occurs when 
 there is a single AsyncSyncer thread.
 And we should treat differently when writer == null in AsyncSyncer:
 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer 
 care about have already by synced by other AsyncSyncer, we can safely ignore 
 sync(as [~stack] does here);
 2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
 txidToSync to avoid data loss: user get successful write response but can't 
 read out the writes, from user's perspective this is data loss (according to 
 above analysis, such case should not occur, but we still should add such 
 defensive treatment to prevent data loss if it really occurs, such as by some 
 bug introduced later)
 also fix the bug where isSyncing needs to reset to false when writer.sync 
 encounters IOException: AsyncSyncer swallows such exception by failing all 
 writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
 later sync, its isSyncing needs to be reset to false in the IOException 
 handling block



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Feng Honghua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Honghua updated HBASE-10329:
-

Description: 
Last month after I introduced multiple AsyncSyncer threads to improve the 
throughput for lower number client write threads, [~stack] encountered a NPE 
while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
Since we have run many times test in cluster to verify the throughput 
improvement, and never encountered such NPE, it really confused me. (and 
[~stack] fixed this by adding 'if (writer != null)' to protect the sync 
operation)

These days from time to time I wondered why the writer can be null in 
AsyncSyncer and whether it's safe to fix it by just adding a null checking 
before doing sync, as [~stack] did. After some digging, I find out the case 
where AsyncSyncer can encounter null-writer, it is as below:
1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
writtenTxid==100
2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
writtenTxid==200
3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
from client writes to enter pendingWrites, and then waits for all items(= 200) 
in pendingWrites to append and finally sync to hdfs
4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync =100 
as a whole)
5. t5: rollWriter now can close writer, set writer=null...
6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before 
rollWriter sets writer to the newly rolled Writer

We can see:
1. the null writer is possible only after there are multiple AsyncSyncer 
threads, that's why we never encountered it before introducing multiple 
AsyncSyncer threads.
2. since rollWriter can set writer=null only after all items of pendingWrites 
sync to hdfs, and AsyncWriter is in the critical path of this task and there is 
only one single AsyncWriter thread, so AsyncWriter can't encounter null writer, 
that's why we never encounter null writer in AsyncWriter though it also uses 
writer. This is the same reason as why null-writer never occurs when there is a 
single AsyncSyncer thread.

And we should treat differently when writer == null in AsyncSyncer:
1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer care 
about have already been synced by other AsyncSyncer, we can safely ignore 
sync(as [~stack] does here);
2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
txidToSync to avoid data loss: user gets successful write response but can't 
read out the writes after getting the successful write response, from user's 
perspective this is data loss (according to above analysis, such case should 
not occur, but we still should add such defensive treatment to prevent data 
loss if it really occurs, such as by some bug introduced later)

also fix the bug where isSyncing needs to reset to false when writer.sync 
encounters IOException: AsyncSyncer swallows such exception by failing all 
writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
later sync, its isSyncing needs to be reset to false in the IOException 
handling block, otherwise it can't be selected by AsyncWriter to do sync

  was:
Last month after I introduced multiple AsyncSyncer threads to improve the 
throughput for lower number client write threads, [~stack] encountered a NPE 
while doing the test where null writer occues in AsyncSyncer when doing sync. 
Since we have run many times test in cluster to verify the throughput 
improvement, and never encountered such NPE, it really confused me. (and 
[~stack] fix this by adding a 'if (writer != null)' to protect the sync 
operation)

I always wonder why the write can be null in AsyncSyncer and whether it's safe 
to fix by just adding a null check before doing sync, as [~stack] did. After 
some dig and analysis, I find out the case where AsyncSyncer can encounter null 
writer, it is as below:
1. t1: AsyncWriter append writes to hdfs, triggers AsyncSyncer 1 with 
writtenTxid==100
2. t2: AsyncWriter append writes to hdfs, triggers AsyncSyncer 2 with 
writtenTxid==200
3. t3: rollWriter starts, it grabs the updateLock which prevents further writes 
entering pendingWrites, and then waits for all items(till 200) in pendingWrites 
to append and finally sync to hdfs
4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200, it also help sync =100 
as a whole
5. t5: rollWriter close writer, set writer=null...
6. t6: AsyncSyncer 1 starts to do sync and finds writer is null... before 
rollWriter set writer to the newly created Writer

We can see:
1. the null writer is possible only after there are multiple AsyncSyncer 
threads, that's why we never encountered it before introducing multiple 
AsyncSyncer threads.
2. since rollWriter can set writer=null only after all items of pendingWrites 
sync to hdfs, and AsyncWriter is in the critical 

[jira] [Updated] (HBASE-10283) Client can't connect with all the running zk servers in MiniZooKeeperCluster

2014-01-13 Thread chendihao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chendihao updated HBASE-10283:
--

Attachment: HBASE-10283-0.94-v1.patch

patch for 0.94

 Client can't connect with all the running zk servers in MiniZooKeeperCluster
 

 Key: HBASE-10283
 URL: https://issues.apache.org/jira/browse/HBASE-10283
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao
Assignee: chendihao
 Attachments: HBASE-10283-0.94-v1.patch


 Refer to HBASE-3052, multiple zk servers can run together in minicluster. The 
 problem is that client can only connect with the first zk server and if you 
 kill the first one, it fails to access the cluster even though other zk 
 servers are serving.
 It's easy to repro.  Firstly `TEST_UTIL.startMiniZKCluster(3)`. Secondly call 
 `killCurrentActiveZooKeeperServer` in MiniZooKeeperCluster. Then when you 
 construct the zk client, it can't connect with the zk cluster for any way. 
 Here is the simple log you can refer.
 {noformat}
 2014-01-03 12:06:58,625 INFO  [main] zookeeper.MiniZooKeeperCluster(194): 
 Started MiniZK Cluster and connect 1 ZK server on client port: 55227
 ..
 2014-01-03 12:06:59,134 INFO  [main] zookeeper.MiniZooKeeperCluster(264): 
 Kill the current active ZK servers in the cluster on client port: 55227
 2014-01-03 12:06:59,134 INFO  [main] zookeeper.MiniZooKeeperCluster(272): 
 Activate a backup zk server in the cluster on client port: 55228
 2014-01-03 12:06:59,366 INFO  [main-EventThread] zookeeper.ZooKeeper(434): 
 Initiating client connection, connectString=localhost:55227 
 sessionTimeout=3000 
 watcher=com.xiaomi.infra.timestamp.TimestampWatcher@a383118
 (then it throws exceptions..)
 {noformat}
 The log is kind of problematic because it always show Started MiniZK Cluster 
 and connect 1 ZK server but actually there're three zk servers.
 Looking deeply we find that the client is still trying to connect with the 
 dead zk server's port. When I print out the zkQuorum it used, only the first 
 zk server's hostport is there and it will not change no matter you kill the 
 server or not. The reason for this is in ZKConfig which will convert HBase 
 settings into zk's. MiniZooKeeperCluster create three servers with the same 
 host name, localhost, and different ports. But HBase self force to use the 
 same port for each zk server and ZKConfig will ignore the other two servers 
 which have the same host name.
 MiniZooKeeperCluster works improperly before we fix this. The bug is not 
 found because we never test whether HBase works or not if we kill the zk 
 active or backup servers in ut. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10329:
---

Status: Patch Available  (was: Open)

 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number client write threads, [~stack] encountered a NPE 
 while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
 Since we have run many times test in cluster to verify the throughput 
 improvement, and never encountered such NPE, it really confused me. (and 
 [~stack] fixed this by adding 'if (writer != null)' to protect the sync 
 operation)
 These days from time to time I wondered why the writer can be null in 
 AsyncSyncer and whether it's safe to fix it by just adding a null checking 
 before doing sync, as [~stack] did. After some digging, I find out the case 
 where AsyncSyncer can encounter null-writer, it is as below:
 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
 from client writes to enter pendingWrites, and then waits for all items(= 
 200) in pendingWrites to append and finally sync to hdfs
 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync 
 =100 as a whole)
 5. t5: rollWriter now can close writer, set writer=null...
 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before 
 rollWriter sets writer to the newly rolled Writer
 We can see:
 1. the null writer is possible only after there are multiple AsyncSyncer 
 threads, that's why we never encountered it before introducing multiple 
 AsyncSyncer threads.
 2. since rollWriter can set writer=null only after all items of pendingWrites 
 sync to hdfs, and AsyncWriter is in the critical path of this task and there 
 is only one single AsyncWriter thread, so AsyncWriter can't encounter null 
 writer, that's why we never encounter null writer in AsyncWriter though it 
 also uses writer. This is the same reason as why null-writer never occurs 
 when there is a single AsyncSyncer thread.
 And we should treat differently when writer == null in AsyncSyncer:
 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer 
 care about have already been synced by other AsyncSyncer, we can safely 
 ignore sync(as [~stack] does here);
 2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
 txidToSync to avoid data loss: user gets successful write response but can't 
 read out the writes after getting the successful write response, from user's 
 perspective this is data loss (according to above analysis, such case should 
 not occur, but we still should add such defensive treatment to prevent data 
 loss if it really occurs, such as by some bug introduced later)
 also fix the bug where isSyncing needs to reset to false when writer.sync 
 encounters IOException: AsyncSyncer swallows such exception by failing all 
 writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
 later sync, its isSyncing needs to be reset to false in the IOException 
 handling block, otherwise it can't be selected by AsyncWriter to do sync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay

2014-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869573#comment-13869573
 ] 

Hadoop QA commented on HBASE-10227:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12622605/HBASE-10227-trunk_v0.patch
  against trunk revision .
  ATTACHMENT ID: 12622605

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8403//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8403//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8403//console

This message is automatically generated.

 When a region is opened, its mvcc isn't correctly recovered when there are 
 split hlogs to replay
 

 Key: HBASE-10227
 URL: https://issues.apache.org/jira/browse/HBASE-10227
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10227-trunk_v0.patch


 When opening a region, all stores are examined to get the max MemstoreTS and 
 it's used as the initial mvcc for the region, and then split hlogs are 
 replayed. In fact the edits in split hlogs have kvs with greater mvcc than 
 all MemstoreTS in all store files, but replaying them don't increment the 
 mvcc according at all. From an overall perspective this mvcc recovering is 
 'logically' incorrect/incomplete.
 Why currently it doesn't incur problem is because no active scanners exists 
 and no new scanners can be created before the region opening completes, so 
 the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely 
 set to zero. They are just treated as kvs put 'earlier' than the ones in 
 HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less 
 than the ones with non-zero mvcc, but in fact they are put 'later'), and 
 without any incorrect impact just because during region opening there are no 
 active scanners existing / created.
 This bug is just in 'logic' sense for the time being, but if later on we need 
 to survive mvcc in the region's whole logic lifecycle(across regionservers) 
 and never set them to zero, this bug needs to be fixed first.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-13 Thread Samir Ahmic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869578#comment-13869578
 ] 

Samir Ahmic commented on HBASE-7386:


Update, with [HBASE-10310] fixed  master---backupmaster failover time is 4s 
when cluster is controlled with supervisor and 3s when is controlled with 
standard scripts.

 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-6581) Build with hadoop.profile=3.0

2014-01-13 Thread Eric Charles (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869588#comment-13869588
 ] 

Eric Charles commented on HBASE-6581:
-

The second issue mentioned above (npe) is fixed with HDFS-5760.

 Build with hadoop.profile=3.0
 -

 Key: HBASE-6581
 URL: https://issues.apache.org/jira/browse/HBASE-6581
 Project: HBase
  Issue Type: Bug
Reporter: Eric Charles
Assignee: Eric Charles
 Fix For: 0.98.1

 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, 
 HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, 
 HBASE-6581-5.patch, HBASE-6581-6.patch, HBASE-6581-7.patch, HBASE-6581.diff, 
 HBASE-6581.diff


 Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to 
 change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT 
 instead of 3.0.0-SNAPSHOT in hbase-common).
 I can provide a patch that would move most of hadoop dependencies in their 
 respective profiles and will define the correct hadoop deps in the 3.0 
 profile.
 Please tell me if that's ok to go this way.
 Thx, Eric
 [1]
 $ mvn clean install -Dhadoop.profile=3.0
 [INFO] Scanning for projects...
 [ERROR] The build could not read 3 projects - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-server:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-server/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-common:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-common/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-it:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-it/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21
 [ERROR] 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10123) Change default ports; move them out of linux ephemeral port range

2014-01-13 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-10123:
---

 Priority: Critical  (was: Major)
Affects Version/s: 0.96.1.1
Fix Version/s: 0.98.0

Bumping priority and adding fix version so we try to get this into 0.98.


 Change default ports; move them out of linux ephemeral port range
 -

 Key: HBASE-10123
 URL: https://issues.apache.org/jira/browse/HBASE-10123
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.1.1
Reporter: stack
Priority: Critical
 Fix For: 0.98.0


 Our defaults clash w/ the range linux assigns itself for creating come-and-go 
 ephemeral ports; likely in our history we've clashed w/ a random, short-lived 
 process.  While easy to change the defaults, we should just ship w/ defaults 
 that make sense.  We could host ourselves up into the 7 or 8k range.
 See http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869629#comment-13869629
 ] 

Nicolas Liochon commented on HBASE-7386:


Thanks a lot for the fix of  HBASE-10310, Samir. I went through your patch. 
It's a difficult read when you don't know supervisor ;-). The definition of 
'PROCESS_STATE_UNKNOWN' is a little scary (as we kill the region server when we 
reach this state).

There are some typos ('Test is supevisored installed' instead of supevisord).

I'm not sure about stuff like 'subprocess.call('/bin/mail -s 
HBASE_PROCESS_EVENT %s  %s'%(email, tmp_file), shell=True)': seems machine 
dependent, there is no /bin/mail on my ubuntu desktop.

Do we have to use python?

It would be good to have a review from someone who knows supervisor... As well, 
this should be documented in the hbase reference guide imho.

 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10330) TableInputFormat/TableRecordReaderImpl leaks HTable

2014-01-13 Thread G G (JIRA)
G G created HBASE-10330:
---

 Summary: TableInputFormat/TableRecordReaderImpl leaks HTable
 Key: HBASE-10330
 URL: https://issues.apache.org/jira/browse/HBASE-10330
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: G G
Priority: Critical


As far as I can tell, TableInputFormat creates an instance of HTable which is 
used by TableRecordReaderImpl. However TableRecordReaderImpl.close() only 
closes the scanner, not the table. In turn the HTable's HConnection's reference 
count is never decreased which leads to leaking HConnections.

TableOutputFormat might have a similar bug.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869634#comment-13869634
 ] 

Hadoop QA commented on HBASE-10329:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12622617/HBASE-10329-trunk_v0.patch
  against trunk revision .
  ATTACHMENT ID: 12622617

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8404//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8404//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8404//console

This message is automatically generated.

 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number client write threads, [~stack] encountered a NPE 
 while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
 Since we have run many times test in cluster to verify the throughput 
 improvement, and never encountered such NPE, it really confused me. (and 
 [~stack] fixed this by adding 'if (writer != null)' to protect the sync 
 operation)
 These days from time to time I wondered why the writer can be null in 
 AsyncSyncer and whether it's safe to fix it by just adding a null checking 
 before doing sync, as [~stack] did. After some digging, I find out the case 
 where AsyncSyncer can encounter null-writer, it is as below:
 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
 from client writes to enter 

[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869643#comment-13869643
 ] 

Himanshu Vashishtha commented on HBASE-10329:
-

The explanation makes total sense… +1.

 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number client write threads, [~stack] encountered a NPE 
 while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
 Since we have run many times test in cluster to verify the throughput 
 improvement, and never encountered such NPE, it really confused me. (and 
 [~stack] fixed this by adding 'if (writer != null)' to protect the sync 
 operation)
 These days from time to time I wondered why the writer can be null in 
 AsyncSyncer and whether it's safe to fix it by just adding a null checking 
 before doing sync, as [~stack] did. After some digging, I find out the case 
 where AsyncSyncer can encounter null-writer, it is as below:
 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
 from client writes to enter pendingWrites, and then waits for all items(= 
 200) in pendingWrites to append and finally sync to hdfs
 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync 
 =100 as a whole)
 5. t5: rollWriter now can close writer, set writer=null...
 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before 
 rollWriter sets writer to the newly rolled Writer
 We can see:
 1. the null writer is possible only after there are multiple AsyncSyncer 
 threads, that's why we never encountered it before introducing multiple 
 AsyncSyncer threads.
 2. since rollWriter can set writer=null only after all items of pendingWrites 
 sync to hdfs, and AsyncWriter is in the critical path of this task and there 
 is only one single AsyncWriter thread, so AsyncWriter can't encounter null 
 writer, that's why we never encounter null writer in AsyncWriter though it 
 also uses writer. This is the same reason as why null-writer never occurs 
 when there is a single AsyncSyncer thread.
 And we should treat differently when writer == null in AsyncSyncer:
 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer 
 care about have already been synced by other AsyncSyncer, we can safely 
 ignore sync(as [~stack] does here);
 2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
 txidToSync to avoid data loss: user gets successful write response but can't 
 read out the writes after getting the successful write response, from user's 
 perspective this is data loss (according to above analysis, such case should 
 not occur, but we still should add such defensive treatment to prevent data 
 loss if it really occurs, such as by some bug introduced later)
 also fix the bug where isSyncing needs to reset to false when writer.sync 
 encounters IOException: AsyncSyncer swallows such exception by failing all 
 writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
 later sync, its isSyncing needs to be reset to false in the IOException 
 handling block, otherwise it can't be selected by AsyncWriter to do sync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10320) Avoid ArrayList.iterator() in tight loops

2014-01-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869683#comment-13869683
 ] 

Lars Hofhansl commented on HBASE-10320:
---

Thanks Stack. I wonder whether the same is true for ArrayLists.
That brings to another thought, the columns list is fixed size and never 
changed once created, so why have an ArrayList at all instead of an Array. Then 
we can use columns.length in the loop and get this optimization. Will try when 
I get some time next.


 Avoid ArrayList.iterator() in tight loops
 -

 Key: HBASE-10320
 URL: https://issues.apache.org/jira/browse/HBASE-10320
 Project: HBase
  Issue Type: Bug
  Components: Performance
Reporter: Lars Hofhansl
 Attachments: 10320-0.94-v2.txt, 10320-0.94.txt


 I noticed that in a profiler (sampler) run ScanQueryMatcher.setRow(...) 
 showed up at all.
 In turns out that the expensive part is iterating over the columns in 
 ExcplicitColumnTracker.reset(). I did some microbenchmarks and found that
 {code}
 private ArrayListX l;
 ...
 for (int i=0; il.size(); i++) {
X = l.get(i);
...
 }
 {code}
 Is twice as fast as:
 {code}
 private ArrayListX l;
 ...
 for (X : l) {
...
 }
 {code}
 The indexed version asymptotically approaches the iterator version, but even 
 at 1m entries it is still faster.
 In my tight loop scans this provides for a 5% performance improvement overall 
 when the ExcplicitColumnTracker is used.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers

2014-01-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869679#comment-13869679
 ] 

Lars Hofhansl commented on HBASE-10274:
---

[~stack], [~apurtell], I assume you want this (test stability fix) in 0.96/0.98.

 MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
 ---

 Key: HBASE-10274
 URL: https://issues.apache.org/jira/browse/HBASE-10274
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao
Assignee: chendihao
Priority: Minor
 Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17

 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, 
 HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch


 HBASE-6820 points out the problem but not fix completely.
 killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will 
 shutdown the ZooKeeperServer and need to close ZKDatabase as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers

2014-01-13 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-10274:
--

Fix Version/s: 0.94.17
   0.99.0
   0.96.2
   0.98.0

 MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
 ---

 Key: HBASE-10274
 URL: https://issues.apache.org/jira/browse/HBASE-10274
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao
Assignee: chendihao
Priority: Minor
 Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17

 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, 
 HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch


 HBASE-6820 points out the problem but not fix completely.
 killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will 
 shutdown the ZooKeeperServer and need to close ZKDatabase as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-10329:
--

Priority: Critical  (was: Major)

Marking critical.
Thanks for digging in Feng. Makes sense.

Slight clarification, we only need to fail writes with syncedTillHere  txid = 
txidToSync, right?


 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Attachments: HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number client write threads, [~stack] encountered a NPE 
 while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
 Since we have run many times test in cluster to verify the throughput 
 improvement, and never encountered such NPE, it really confused me. (and 
 [~stack] fixed this by adding 'if (writer != null)' to protect the sync 
 operation)
 These days from time to time I wondered why the writer can be null in 
 AsyncSyncer and whether it's safe to fix it by just adding a null checking 
 before doing sync, as [~stack] did. After some digging, I find out the case 
 where AsyncSyncer can encounter null-writer, it is as below:
 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
 from client writes to enter pendingWrites, and then waits for all items(= 
 200) in pendingWrites to append and finally sync to hdfs
 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync 
 =100 as a whole)
 5. t5: rollWriter now can close writer, set writer=null...
 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before 
 rollWriter sets writer to the newly rolled Writer
 We can see:
 1. the null writer is possible only after there are multiple AsyncSyncer 
 threads, that's why we never encountered it before introducing multiple 
 AsyncSyncer threads.
 2. since rollWriter can set writer=null only after all items of pendingWrites 
 sync to hdfs, and AsyncWriter is in the critical path of this task and there 
 is only one single AsyncWriter thread, so AsyncWriter can't encounter null 
 writer, that's why we never encounter null writer in AsyncWriter though it 
 also uses writer. This is the same reason as why null-writer never occurs 
 when there is a single AsyncSyncer thread.
 And we should treat differently when writer == null in AsyncSyncer:
 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer 
 care about have already been synced by other AsyncSyncer, we can safely 
 ignore sync(as [~stack] does here);
 2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
 txidToSync to avoid data loss: user gets successful write response but can't 
 read out the writes after getting the successful write response, from user's 
 perspective this is data loss (according to above analysis, such case should 
 not occur, but we still should add such defensive treatment to prevent data 
 loss if it really occurs, such as by some bug introduced later)
 also fix the bug where isSyncing needs to reset to false when writer.sync 
 encounters IOException: AsyncSyncer swallows such exception by failing all 
 writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
 later sync, its isSyncing needs to be reset to false in the IOException 
 handling block, otherwise it can't be selected by AsyncWriter to do sync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10330) TableInputFormat/TableRecordReaderImpl leaks HTable

2014-01-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869694#comment-13869694
 ] 

Lars Hofhansl commented on HBASE-10330:
---

Is that the case in 0.94 as well? (I'll check)

 TableInputFormat/TableRecordReaderImpl leaks HTable
 ---

 Key: HBASE-10330
 URL: https://issues.apache.org/jira/browse/HBASE-10330
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: G G
Priority: Critical

 As far as I can tell, TableInputFormat creates an instance of HTable which is 
 used by TableRecordReaderImpl. However TableRecordReaderImpl.close() only 
 closes the scanner, not the table. In turn the HTable's HConnection's 
 reference count is never decreased which leads to leaking HConnections.
 TableOutputFormat might have a similar bug.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10123) Change default ports; move them out of linux ephemeral port range

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869700#comment-13869700
 ] 

Andrew Purtell commented on HBASE-10123:


I need a patch or this goes to 0.98.1

 Change default ports; move them out of linux ephemeral port range
 -

 Key: HBASE-10123
 URL: https://issues.apache.org/jira/browse/HBASE-10123
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.1.1
Reporter: stack
Priority: Critical
 Fix For: 0.98.0


 Our defaults clash w/ the range linux assigns itself for creating come-and-go 
 ephemeral ports; likely in our history we've clashed w/ a random, short-lived 
 process.  While easy to change the defaults, we should just ship w/ defaults 
 that make sense.  We could host ourselves up into the 7 or 8k range.
 See http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10030) [JDK8] Erasure of PoolMap#remove(K,V) conflicts with superclass method

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869698#comment-13869698
 ] 

Andrew Purtell commented on HBASE-10030:


Possibly. Eclipse on JDK8 worked for me. I can check that again when I have a 
free moment. 

 [JDK8] Erasure of PoolMap#remove(K,V) conflicts with superclass method
 --

 Key: HBASE-10030
 URL: https://issues.apache.org/jira/browse/HBASE-10030
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Trivial
 Fix For: 0.98.0

 Attachments: 10030.patch


 On JDK 8, the erasure of PoolMap#remove(K,V) conflicts with superclass method 
 remove(Object,Object).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

2014-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10323:
---

Fix Version/s: 0.99.0

The 'mvn site' failure occurs in other QA runs as well.
It was not caused by your patch.

 Auto detect data block encoding in HFileOutputFormat
 

 Key: HBASE-10323
 URL: https://issues.apache.org/jira/browse/HBASE-10323
 Project: HBase
  Issue Type: Improvement
Reporter: Ishan Chhabra
Assignee: Ishan Chhabra
 Fix For: 0.99.0

 Attachments: HBASE_10323-0.94.15-v1.patch, 
 HBASE_10323-0.94.15-v2.patch, HBASE_10323-0.94.15-v3.patch, 
 HBASE_10323-trunk-v1.patch, HBASE_10323-trunk-v2.patch


 Currently, one has to specify the data block encoding of the table explicitly 
 using the config parameter 
 hbase.mapreduce.hfileoutputformat.datablock.encoding when doing a bulkload 
 load. This option is easily missed, not documented and also works differently 
 than compression, block size and bloom filter type, which are auto detected. 
 The solution would be to add support to auto detect datablock encoding 
 similar to other parameters. 
 The current patch does the following:
 1. Automatically detects datablock encoding in HFileOutputFormat.
 2. Keeps the legacy option of manually specifying the datablock encoding
 around as a method to override auto detections.
 3. Moves string conf parsing to the start of the program so that it fails
 fast during starting up instead of failing during record writes. It also
 makes the internals of the program type safe.
 4. Adds missing doc strings and unit tests for code serializing and
 deserializing config paramerters for bloom filer type, block size and
 datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869708#comment-13869708
 ] 

Andrew Purtell commented on HBASE-10321:


+1 on patch V2

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869711#comment-13869711
 ] 

Andrew Purtell edited comment on HBASE-10326 at 1/13/14 5:21 PM:
-

bq. Instead can use AccessControlClient#grant ? This code is repeated in tests..

Or use the new grant/revoke methods in SecureTestUtils, which are designed for 
granting or revoking in tests. They do things only possible in miniclusters to 
insure the AC has propagated the grant to all caches first, to avoid flapping 
tests.

Are the changes to TestVisibilityLabels needed? The test runs under the 
superuser implicitly right? There is no functional change though, would be fine 
to keep them.

What do the new tests in TestVisibilityLabelsWithACL do? Comment, please.


was (Author: apurtell):
bq. Instead can use AccessControlClient#grant ? This code is repeated in tests..

Or use the new grant/revoke methods in SecureTestUtils methods for granting, 
which also insures the AC has propagated the grant to all caches first, to 
avoid racing tests.

Are the changes to TestVisibilityLabels needed? The test runs under the 
superuser implicitly right? There is no functional change though, would be fine 
to keep them.

What do the new tests in TestVisibilityLabelsWithACL do? Comment, please.

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869713#comment-13869713
 ] 

Andrew Purtell commented on HBASE-10326:


[~anoop.hbase], Ram mailed me that he is away this evening. I would be +1 for a 
commit of this patch without the test changes. What do you think? We can add 
the test changes later as an addendum or new JIRA.

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869711#comment-13869711
 ] 

Andrew Purtell commented on HBASE-10326:


bq. Instead can use AccessControlClient#grant ? This code is repeated in tests..

Or use the new grant/revoke methods in SecureTestUtils methods for granting, 
which also insures the AC has propagated the grant to all caches first, to 
avoid racing tests.

Are the changes to TestVisibilityLabels needed? The test runs under the 
superuser implicitly right? There is no functional change though, would be fine 
to keep them.

What do the new tests in TestVisibilityLabelsWithACL do? Comment, please.

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10321:
---

  Resolution: Fixed
Release Note: A new Codec CellCodecV2 is added which can do all the work of 
CellCodec plus writing/reading Tags. CellCodec will not be able to handle tags. 
When one wants to use CellCodec and tags need to use CellCodecV2.
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to Trunk and 0.98.   Thanks for the reviews.

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869729#comment-13869729
 ] 

Anoop Sam John commented on HBASE-10326:


I will commit.

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869729#comment-13869729
 ] 

Anoop Sam John edited comment on HBASE-10326 at 1/13/14 5:42 PM:
-

I will commit patch as it is now..  We can improve the tests later as you 
suggested.


was (Author: anoop.hbase):
I will commit.

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869745#comment-13869745
 ] 

Andrew Purtell commented on HBASE-10274:


One HadoopQA run was good, the other failed 
org.apache.hadoop.hbase.zookeeper.lock.TestZKInterProcessReadWriteLock. Has 
anyone tested if this changes makes our ZK unit tests flaky?

 MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
 ---

 Key: HBASE-10274
 URL: https://issues.apache.org/jira/browse/HBASE-10274
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao
Assignee: chendihao
Priority: Minor
 Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17

 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, 
 HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch


 HBASE-6820 points out the problem but not fix completely.
 killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will 
 shutdown the ZooKeeperServer and need to close ZKDatabase as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10274) MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869752#comment-13869752
 ] 

Andrew Purtell commented on HBASE-10274:


Anyway, we can try it on 0.98. If tests do flake, I can revert and recommit to 
0.98.1.

 MiniZookeeperCluster should close ZKDatabase when shutdown ZooKeeperServers
 ---

 Key: HBASE-10274
 URL: https://issues.apache.org/jira/browse/HBASE-10274
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chendihao
Assignee: chendihao
Priority: Minor
 Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17

 Attachments: HBASE-10274-0.94-v1.patch, HBASE-10274-0.94-v2.patch, 
 HBASE-10274-truck-v1.patch, HBASE-10274-truck-v2.patch


 HBASE-6820 points out the problem but not fix completely.
 killCurrentActiveZooKeeperServer() and killOneBackupZooKeeperServer() will 
 shutdown the ZooKeeperServer and need to close ZKDatabase as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10326:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to Trunk and 0.98.  Thanks for the patch Ram. Thanks for the review 
Andy.

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10304) Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString

2014-01-13 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869753#comment-13869753
 ] 

Nick Dimiduk commented on HBASE-10304:
--

[~jxiang] the only thing you needed to tweak for the first two variations was 
explicit inclusion of the hbase-config in $HADOOP_CLASSPATH ? Where else would 
the hadoop invocation pick up hbase-site.xml? Adding hbase-config in this 
invocation method has always been required, right?

What about launching the job using our bin/hbase script? Do you see the same 
IllegalAccessError when launching the fat jar that way?

 Running an hbase job jar: IllegalAccessError: class 
 com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass 
 com.google.protobuf.LiteralByteString
 

 Key: HBASE-10304
 URL: https://issues.apache.org/jira/browse/HBASE-10304
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.98.0, 0.96.1.1
Reporter: stack
Priority: Blocker
 Fix For: 0.98.0

 Attachments: hbase-10304_not_tested.patch, jobjar.xml


 (Jimmy has been working on this one internally.  I'm just the messenger 
 raising this critical issue upstream).
 So, if you make job jar and bundle up hbase inside in it because you want to 
 access hbase from your mapreduce task, the deploy of the job jar to the 
 cluster fails with:
 {code}
 14/01/05 08:59:19 INFO Configuration.deprecation: 
 topology.node.switch.mapping.impl is deprecated. Instead, use 
 net.topology.node.switch.mapping.impl
 14/01/05 08:59:19 INFO Configuration.deprecation: io.bytes.per.checksum is 
 deprecated. Instead, use dfs.bytes-per-checksum
 Exception in thread main java.lang.IllegalAccessError: class 
 com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass 
 com.google.protobuf.LiteralByteString
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
   at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818)
   at 
 org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433)
   at 
 org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186)
   at 
 org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147)
   at 
 org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270)
   at 
 org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100)
   at 
 com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124)
   at 
 com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at 
 com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}
 So, ZCLBS is a hack.  This class is in the hbase-protocol module.  It is in 
 the com.google.protobuf package.  All is well and good usually.
 But when we make a job jar and bundle up hbase inside it, our 'trick' breaks. 
  RunJar makes a new class loader to run the job jar.  This URLCLassLoader 
 'attaches' all the jars and classes that are in jobjar so they can be found 
 when it does to do a lookup only Classloaders work by always delegating to 
 their parent first (unless you are a WAR file in a container where delegation 
 is 'off' for the most part) and in this case, the parent classloader will 
 have access to a pb jar since pb is in the hadoop CLASSPATH.  So, 

[jira] [Updated] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10326:
---

 Component/s: security
Release Note: HBase super user can (any user who is having system 
visibility label) read back all the cells irrespective of visibility expression 
applied for cells.

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10330) TableInputFormat/TableRecordReaderImpl leaks HTable

2014-01-13 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10330:
---

Fix Version/s: 0.99.0
   0.96.2
   0.98.0

Seems like it would have a straightforward fix. 

 TableInputFormat/TableRecordReaderImpl leaks HTable
 ---

 Key: HBASE-10330
 URL: https://issues.apache.org/jira/browse/HBASE-10330
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: G G
Priority: Critical
 Fix For: 0.98.0, 0.96.2, 0.99.0


 As far as I can tell, TableInputFormat creates an instance of HTable which is 
 used by TableRecordReaderImpl. However TableRecordReaderImpl.close() only 
 closes the scanner, not the table. In turn the HTable's HConnection's 
 reference count is never decreased which leads to leaking HConnections.
 TableOutputFormat might have a similar bug.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10277) refactor AsyncProcess

2014-01-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869762#comment-13869762
 ] 

Sergey Shelukhin commented on HBASE-10277:
--

I started the mailing thread on this. 

 refactor AsyncProcess
 -

 Key: HBASE-10277
 URL: https://issues.apache.org/jira/browse/HBASE-10277
 Project: HBase
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-10277.patch


 AsyncProcess currently has two patterns of usage, one from HTable flush w/o 
 callback and with reuse, and one from HCM/HTable batch call, with callback 
 and w/o reuse. In the former case (but not the latter), it also does some 
 throttling of actions on initial submit call, limiting the number of 
 outstanding actions per server.
 The latter case is relatively straightforward. The former appears to be error 
 prone due to reuse - if, as javadoc claims should be safe, multiple submit 
 calls are performed without waiting for the async part of the previous call 
 to finish, fields like hasError become ambiguous and can be used for the 
 wrong call; callback for success/failure is called based on original index 
 of an action in submitted list, but with only one callback supplied to AP in 
 ctor it's not clear to which submit call the index belongs, if several are 
 outstanding.
 I was going to add support for HBASE-10070 to AP, and found that it might be 
 difficult to do cleanly.
 It would be nice to normalize AP usage patterns; in particular, separate the 
 global part (load tracking) from per-submit-call part.
 Per-submit part can more conveniently track stuff like initialActions, 
 mapping of indexes and retry information, that is currently passed around the 
 method calls.
 I am not sure yet, but maybe sending of the original index to server in 
 ClientProtos.MultiAction can also be avoided.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10320) Avoid ArrayList.iterator() in tight loops

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869768#comment-13869768
 ] 

Andrew Purtell commented on HBASE-10320:


Is this a 0.94 only change or applicable everywhere? (The latter, right?)

 Avoid ArrayList.iterator() in tight loops
 -

 Key: HBASE-10320
 URL: https://issues.apache.org/jira/browse/HBASE-10320
 Project: HBase
  Issue Type: Bug
  Components: Performance
Reporter: Lars Hofhansl
 Attachments: 10320-0.94-v2.txt, 10320-0.94.txt


 I noticed that in a profiler (sampler) run ScanQueryMatcher.setRow(...) 
 showed up at all.
 In turns out that the expensive part is iterating over the columns in 
 ExcplicitColumnTracker.reset(). I did some microbenchmarks and found that
 {code}
 private ArrayListX l;
 ...
 for (int i=0; il.size(); i++) {
X = l.get(i);
...
 }
 {code}
 Is twice as fast as:
 {code}
 private ArrayListX l;
 ...
 for (X : l) {
...
 }
 {code}
 The indexed version asymptotically approaches the iterator version, but even 
 at 1m entries it is still faster.
 In my tight loop scans this provides for a 5% performance improvement overall 
 when the ExcplicitColumnTracker is used.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869766#comment-13869766
 ] 

Andrew Purtell commented on HBASE-10326:


Then I will fix the tests now. HBASE-10331

 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10331) Insure security tests use SecureTestUtil methods for grants

2014-01-13 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-10331:
--

 Summary: Insure security tests use SecureTestUtil methods for 
grants
 Key: HBASE-10331
 URL: https://issues.apache.org/jira/browse/HBASE-10331
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay

2014-01-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869784#comment-13869784
 ] 

Sergey Shelukhin commented on HBASE-10227:
--


RB would be nice.
Storing mvcc in store file always is an interesting option.
However, it becomes unnecessary for most KVs after some time under current 
HBase assumptions (that storefiles can be compared, all KVs in one SF are older 
than all KVs in the other per seqId/mvcc).
The only uses for mvcc in KV at that point is exact same key in the file, and 
scanners, but the latter need disappears after some time, see some later 
comments in HBASE-10244.

Some minor comments on the patch:
bq.  mvcc.reinitialize(maxMemstoreTS + 1); 
is now called twice in the same place.

With removal of usage performCompaction no longer needs smallestReadPoint.
Also parameter might not be necessary in createWriterInTmp

Ok this is major comment.
bq. if (versionOrLength == VERSION_3) {
Is it possible to add MVCCs from corresponding KVs to protobuf part, rather 
than expand WALEdit format?
I think the proper way is actually to make mvcc serialization a first class 
part of KV, there's JIRA for that; but that might be too much for this patch, 
as it would require new HFile version.
For now we can at least avoid more hard-to-maintain-compat stuff down the line.
Already, it appears that old reader will not read V_3 correctly.



 When a region is opened, its mvcc isn't correctly recovered when there are 
 split hlogs to replay
 

 Key: HBASE-10227
 URL: https://issues.apache.org/jira/browse/HBASE-10227
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10227-trunk_v0.patch


 When opening a region, all stores are examined to get the max MemstoreTS and 
 it's used as the initial mvcc for the region, and then split hlogs are 
 replayed. In fact the edits in split hlogs have kvs with greater mvcc than 
 all MemstoreTS in all store files, but replaying them don't increment the 
 mvcc according at all. From an overall perspective this mvcc recovering is 
 'logically' incorrect/incomplete.
 Why currently it doesn't incur problem is because no active scanners exists 
 and no new scanners can be created before the region opening completes, so 
 the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely 
 set to zero. They are just treated as kvs put 'earlier' than the ones in 
 HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less 
 than the ones with non-zero mvcc, but in fact they are put 'later'), and 
 without any incorrect impact just because during region opening there are no 
 active scanners existing / created.
 This bug is just in 'logic' sense for the time being, but if later on we need 
 to survive mvcc in the region's whole logic lifecycle(across regionservers) 
 and never set them to zero, this bug needs to be fixed first.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10324) refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869772#comment-13869772
 ] 

Andrew Purtell commented on HBASE-10324:


Let's commit this. It needs to go into 0.98 branch also because we incorporated 
FHH's log changes there too.

 refactor deferred-log-flush/Durability related interface/code/naming to align 
 with changed semantic of the new write thread model
 -

 Key: HBASE-10324
 URL: https://issues.apache.org/jira/browse/HBASE-10324
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10324-trunk_v0.patch, HBASE-10324-trunk_v1.patch, 
 HBASE-10324-trunk_v2.patch


 By the new write thread model introduced by 
 [HBASE-8755|https://issues.apache.org/jira/browse/HBASE-8755], some 
 deferred-log-flush/Durability API/code/names should be change accordingly:
 1. no timer-triggered deferred-log-flush since flush is always done by async 
 threads, so configuration 'hbase.regionserver.optionallogflushinterval' is no 
 longer needed
 2. the async writer-syncer-notifier threads will always be triggered 
 implicitly, this semantic is that it always holds that 
 'hbase.regionserver.optionallogflushinterval'  0, so deferredLogSyncDisabled 
 in HRegion.java which affects durability behavior should always be false
 3. what HTableDescriptor.isDeferredLogFlush really means is the write  can 
 return without waiting for the sync is done, so the interface name should be 
 changed to isAsyncLogFlush/setAsyncLogFlush to reflect their real meaning



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10295) Refactor the replication implementation to eliminate permanent zk node

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869785#comment-13869785
 ] 

Andrew Purtell commented on HBASE-10295:


Nice idea, +1

 Refactor the replication  implementation to eliminate permanent zk node
 ---

 Key: HBASE-10295
 URL: https://issues.apache.org/jira/browse/HBASE-10295
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Feng Honghua
Assignee: Feng Honghua
 Fix For: 0.99.0


 Though this is a broader and bigger change, it original motivation derives 
 from [HBASE-8751|https://issues.apache.org/jira/browse/HBASE-8751]: the newly 
 introduced per-peer tableCFs attribute should be treated the same way as the 
 peer-state, which is a permanent sub-node under peer node but using permanent 
 zk node is deemed as an incorrect practice. So let's refactor to eliminate 
 the permanent zk node. And the HBASE-8751 can then align its newly introduced 
 per-peer tableCFs attribute with this *correct* implementation theme.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10329:
---

Affects Version/s: 0.98.0
Fix Version/s: 0.99.0
   0.98.0

+1, please commit to trunk and 0.98 branch.

 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Affects Versions: 0.98.0
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number client write threads, [~stack] encountered a NPE 
 while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
 Since we have run many times test in cluster to verify the throughput 
 improvement, and never encountered such NPE, it really confused me. (and 
 [~stack] fixed this by adding 'if (writer != null)' to protect the sync 
 operation)
 These days from time to time I wondered why the writer can be null in 
 AsyncSyncer and whether it's safe to fix it by just adding a null checking 
 before doing sync, as [~stack] did. After some digging, I find out the case 
 where AsyncSyncer can encounter null-writer, it is as below:
 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
 from client writes to enter pendingWrites, and then waits for all items(= 
 200) in pendingWrites to append and finally sync to hdfs
 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync 
 =100 as a whole)
 5. t5: rollWriter now can close writer, set writer=null...
 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before 
 rollWriter sets writer to the newly rolled Writer
 We can see:
 1. the null writer is possible only after there are multiple AsyncSyncer 
 threads, that's why we never encountered it before introducing multiple 
 AsyncSyncer threads.
 2. since rollWriter can set writer=null only after all items of pendingWrites 
 sync to hdfs, and AsyncWriter is in the critical path of this task and there 
 is only one single AsyncWriter thread, so AsyncWriter can't encounter null 
 writer, that's why we never encounter null writer in AsyncWriter though it 
 also uses writer. This is the same reason as why null-writer never occurs 
 when there is a single AsyncSyncer thread.
 And we should treat differently when writer == null in AsyncSyncer:
 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer 
 care about have already been synced by other AsyncSyncer, we can safely 
 ignore sync(as [~stack] does here);
 2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
 txidToSync to avoid data loss: user gets successful write response but can't 
 read out the writes after getting the successful write response, from user's 
 perspective this is data loss (according to above analysis, such case should 
 not occur, but we still should add such defensive treatment to prevent data 
 loss if it really occurs, such as by some bug introduced later)
 also fix the bug where isSyncing needs to reset to false when writer.sync 
 encounters IOException: AsyncSyncer swallows such exception by failing all 
 writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
 later sync, its isSyncing needs to be reset to false in the IOException 
 handling block, otherwise it can't be selected by AsyncWriter to do sync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10324) refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869773#comment-13869773
 ] 

Andrew Purtell commented on HBASE-10324:


Nit: Remove the '@Deprecated' tags from the renamed methods. The deprecated 
methods have been effectively removed and replaced with a new API.

 refactor deferred-log-flush/Durability related interface/code/naming to align 
 with changed semantic of the new write thread model
 -

 Key: HBASE-10324
 URL: https://issues.apache.org/jira/browse/HBASE-10324
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10324-trunk_v0.patch, HBASE-10324-trunk_v1.patch, 
 HBASE-10324-trunk_v2.patch


 By the new write thread model introduced by 
 [HBASE-8755|https://issues.apache.org/jira/browse/HBASE-8755], some 
 deferred-log-flush/Durability API/code/names should be change accordingly:
 1. no timer-triggered deferred-log-flush since flush is always done by async 
 threads, so configuration 'hbase.regionserver.optionallogflushinterval' is no 
 longer needed
 2. the async writer-syncer-notifier threads will always be triggered 
 implicitly, this semantic is that it always holds that 
 'hbase.regionserver.optionallogflushinterval'  0, so deferredLogSyncDisabled 
 in HRegion.java which affects durability behavior should always be false
 3. what HTableDescriptor.isDeferredLogFlush really means is the write  can 
 return without waiting for the sync is done, so the interface name should be 
 changed to isAsyncLogFlush/setAsyncLogFlush to reflect their real meaning



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HBASE-10324) refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869773#comment-13869773
 ] 

Andrew Purtell edited comment on HBASE-10324 at 1/13/14 6:10 PM:
-

Nit: Remove the '@Deprecated' tags from the renamed methods. The deprecated 
methods have been effectively removed and replaced with a new API.
Edit: Committer, please also insure the deprecated methods appear as such in 
0.96 branch.


was (Author: apurtell):
Nit: Remove the '@Deprecated' tags from the renamed methods. The deprecated 
methods have been effectively removed and replaced with a new API.

 refactor deferred-log-flush/Durability related interface/code/naming to align 
 with changed semantic of the new write thread model
 -

 Key: HBASE-10324
 URL: https://issues.apache.org/jira/browse/HBASE-10324
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10324-trunk_v0.patch, HBASE-10324-trunk_v1.patch, 
 HBASE-10324-trunk_v2.patch


 By the new write thread model introduced by 
 [HBASE-8755|https://issues.apache.org/jira/browse/HBASE-8755], some 
 deferred-log-flush/Durability API/code/names should be change accordingly:
 1. no timer-triggered deferred-log-flush since flush is always done by async 
 threads, so configuration 'hbase.regionserver.optionallogflushinterval' is no 
 longer needed
 2. the async writer-syncer-notifier threads will always be triggered 
 implicitly, this semantic is that it always holds that 
 'hbase.regionserver.optionallogflushinterval'  0, so deferredLogSyncDisabled 
 in HRegion.java which affects durability behavior should always be false
 3. what HTableDescriptor.isDeferredLogFlush really means is the write  can 
 return without waiting for the sync is done, so the interface name should be 
 changed to isAsyncLogFlush/setAsyncLogFlush to reflect their real meaning



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-6581) Build with hadoop.profile=3.0

2014-01-13 Thread Eric Charles (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869788#comment-13869788
 ] 

Eric Charles commented on HBASE-6581:
-

and now I have looked at first issue related to 
TestReplicationHLogReaderManager. The test is working but is just slow and is 
probably killed by the build system. Some methods take 4 times more with the 
usage of the Method object. Strangely, TestHLog takes the same time - I will 
write a small blog post with more details.

Bottom line: I would like to propose the commit of the changes related to the 
pom.xml (they were not easy to setup and I would prefer not loosing that work : 
it basicallyl introduces a hadoop3 module and nicely exclude the hadoop-core 
1.1 and so on...). For the java class, we can further think to a another 
solution.

If someone gives the green light for the poms, I will upload a new patch.

 Build with hadoop.profile=3.0
 -

 Key: HBASE-6581
 URL: https://issues.apache.org/jira/browse/HBASE-6581
 Project: HBase
  Issue Type: Bug
Reporter: Eric Charles
Assignee: Eric Charles
 Fix For: 0.98.1

 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, 
 HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, 
 HBASE-6581-5.patch, HBASE-6581-6.patch, HBASE-6581-7.patch, HBASE-6581.diff, 
 HBASE-6581.diff


 Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to 
 change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT 
 instead of 3.0.0-SNAPSHOT in hbase-common).
 I can provide a patch that would move most of hadoop dependencies in their 
 respective profiles and will define the correct hadoop deps in the 3.0 
 profile.
 Please tell me if that's ok to go this way.
 Thx, Eric
 [1]
 $ mvn clean install -Dhadoop.profile=3.0
 [INFO] Scanning for projects...
 [ERROR] The build could not read 3 projects - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-server:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-server/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-common:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-common/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-it:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-it/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21
 [ERROR] 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10332) Missing .regioninfo file during daughter open processing

2014-01-13 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-10332:
--

 Summary: Missing .regioninfo file during daughter open processing
 Key: HBASE-10332
 URL: https://issues.apache.org/jira/browse/HBASE-10332
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Purtell


Under cluster stress testing, there are a fair amount of warnings like this:

{noformat}
2014-01-12 04:52:29,183 WARN  
[test-1,8120,1389467616661-daughterOpener=490a58c14b14a59e8d303d310684f0b0] 
regionserver.HRegionFileSystem: .regioninfo file not found for region: 
490a58c14b14a59e8d303d310684f0b0
{noformat}

This is from HRegionFileSystem#checkRegionInfoOnFilesystem, which catches a 
FileNotFoundException in this case and calls writeRegionInfoOnFilesystem to fix 
up the issue.

Is this a bug in splitting?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10332) Missing .regioninfo file during daughter open processing

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869811#comment-13869811
 ] 

Andrew Purtell commented on HBASE-10332:


Ping [~mbertozzi], author of the code in question.

 Missing .regioninfo file during daughter open processing
 

 Key: HBASE-10332
 URL: https://issues.apache.org/jira/browse/HBASE-10332
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Purtell

 Under cluster stress testing, there are a fair amount of warnings like this:
 {noformat}
 2014-01-12 04:52:29,183 WARN  
 [test-1,8120,1389467616661-daughterOpener=490a58c14b14a59e8d303d310684f0b0] 
 regionserver.HRegionFileSystem: .regioninfo file not found for region: 
 490a58c14b14a59e8d303d310684f0b0
 {noformat}
 This is from HRegionFileSystem#checkRegionInfoOnFilesystem, which catches a 
 FileNotFoundException in this case and calls writeRegionInfoOnFilesystem to 
 fix up the issue.
 Is this a bug in splitting?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-8547) Fix java.lang.RuntimeException: Cached an already cached block

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869817#comment-13869817
 ] 

Andrew Purtell commented on HBASE-8547:
---

{noformat}
2014-01-11 22:22:56,895 WARN  [RpcServer.handler=1,port=8120] 
hfile.LruBlockCache: Cached an already cached block: 
a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926 
cb:a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926. 
This is harmless and can happen in rare cases (see HBASE-8547)
{noformat}

14 occurrences writing 1 billion keys with flushes every 30 seconds, indeed 
seems rare by observation. Just FYI.

 Fix java.lang.RuntimeException: Cached an already cached block
 --

 Key: HBASE-8547
 URL: https://issues.apache.org/jira/browse/HBASE-8547
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: hbase-8547_v1-0.94.patch, hbase-8547_v1-0.94.patch, 
 hbase-8547_v1.patch, hbase-8547_v2-0.94-reduced.patch, 
 hbase-8547_v2-addendum2+3-0.94.patch, hbase-8547_v2-addendum2.patch, 
 hbase-8547_v2-addendum2.patch, hbase-8547_v2-addendum3.patch, 
 hbase-8547_v2-trunk.patch


 In one test, one of the region servers received the following on 0.94. 
 Note HalfStoreFileReader in the stack trace. I think the root cause is that 
 after the region is split, the mid point can be in the middle of the block 
 (for store files that the mid point is not chosen from). Each half store file 
 tries to load the half block and put it in the block cache. Since IdLock is 
 instantiated per store file reader, they do not share the same IdLock 
 instance, thus does not lock against each other effectively. 
 {code}
 2013-05-12 01:30:37,733 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer:·
 java.lang.RuntimeException: Cached an already cached block
   at 
 org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:279)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:353)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
   at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:237)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277)
   at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543)
   at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3829)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3896)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3778)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3770)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2643)
   at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 {code}
 I can see two possible fixes: 
  # Allow this kind of rare cases in LruBlockCache by not throwing an 
 exception. 
  # Move the lock instances to upper layer (possibly in CacheConfig), and let 
 half hfile readers share the same IdLock implementation. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HBASE-8547) Fix java.lang.RuntimeException: Cached an already cached block

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869817#comment-13869817
 ] 

Andrew Purtell edited comment on HBASE-8547 at 1/13/14 6:44 PM:


{noformat}
2014-01-11 22:22:56,895 WARN  [RpcServer.handler=1,port=8120] 
hfile.LruBlockCache: Cached an already cached block: 
a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926 
cb:a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926. 
This is harmless and can happen in rare cases (see HBASE-8547)
{noformat}

14 occurrences on one RS writing 1 billion keys with flushes every 30 seconds, 
indeed seems rare by observation. Just FYI.


was (Author: apurtell):
{noformat}
2014-01-11 22:22:56,895 WARN  [RpcServer.handler=1,port=8120] 
hfile.LruBlockCache: Cached an already cached block: 
a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926 
cb:a957fcb9a7474e9b9e4858e74d6a1eec.05bd6ea9e5d83483d35ee6658cc189a5_92811926. 
This is harmless and can happen in rare cases (see HBASE-8547)
{noformat}

14 occurrences writing 1 billion keys with flushes every 30 seconds, indeed 
seems rare by observation. Just FYI.

 Fix java.lang.RuntimeException: Cached an already cached block
 --

 Key: HBASE-8547
 URL: https://issues.apache.org/jira/browse/HBASE-8547
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: hbase-8547_v1-0.94.patch, hbase-8547_v1-0.94.patch, 
 hbase-8547_v1.patch, hbase-8547_v2-0.94-reduced.patch, 
 hbase-8547_v2-addendum2+3-0.94.patch, hbase-8547_v2-addendum2.patch, 
 hbase-8547_v2-addendum2.patch, hbase-8547_v2-addendum3.patch, 
 hbase-8547_v2-trunk.patch


 In one test, one of the region servers received the following on 0.94. 
 Note HalfStoreFileReader in the stack trace. I think the root cause is that 
 after the region is split, the mid point can be in the middle of the block 
 (for store files that the mid point is not chosen from). Each half store file 
 tries to load the half block and put it in the block cache. Since IdLock is 
 instantiated per store file reader, they do not share the same IdLock 
 instance, thus does not lock against each other effectively. 
 {code}
 2013-05-12 01:30:37,733 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer:·
 java.lang.RuntimeException: Cached an already cached block
   at 
 org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:279)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:353)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
   at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:237)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277)
   at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543)
   at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3829)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3896)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3778)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3770)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2643)
   at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308)
   at 
 

[jira] [Updated] (HBASE-10324) refactor deferred-log-flush/Durability related interface/code/naming to align with changed semantic of the new write thread model

2014-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10324:
---

Fix Version/s: 0.99.0
   0.98.0

 refactor deferred-log-flush/Durability related interface/code/naming to align 
 with changed semantic of the new write thread model
 -

 Key: HBASE-10324
 URL: https://issues.apache.org/jira/browse/HBASE-10324
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10324-trunk_v0.patch, HBASE-10324-trunk_v1.patch, 
 HBASE-10324-trunk_v2.patch


 By the new write thread model introduced by 
 [HBASE-8755|https://issues.apache.org/jira/browse/HBASE-8755], some 
 deferred-log-flush/Durability API/code/names should be change accordingly:
 1. no timer-triggered deferred-log-flush since flush is always done by async 
 threads, so configuration 'hbase.regionserver.optionallogflushinterval' is no 
 longer needed
 2. the async writer-syncer-notifier threads will always be triggered 
 implicitly, this semantic is that it always holds that 
 'hbase.regionserver.optionallogflushinterval'  0, so deferredLogSyncDisabled 
 in HRegion.java which affects durability behavior should always be false
 3. what HTableDescriptor.isDeferredLogFlush really means is the write  can 
 return without waiting for the sync is done, so the interface name should be 
 changed to isAsyncLogFlush/setAsyncLogFlush to reflect their real meaning



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869833#comment-13869833
 ] 

Ted Yu commented on HBASE-10329:


With this fix, what should be done in the following catch block (line 1250) ?
{code}
  } catch (Exception e) {
LOG.error(UNEXPECTED, e);
{code}
I assume we won't hit the above anymore.

 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Affects Versions: 0.98.0
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number client write threads, [~stack] encountered a NPE 
 while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
 Since we have run many times test in cluster to verify the throughput 
 improvement, and never encountered such NPE, it really confused me. (and 
 [~stack] fixed this by adding 'if (writer != null)' to protect the sync 
 operation)
 These days from time to time I wondered why the writer can be null in 
 AsyncSyncer and whether it's safe to fix it by just adding a null checking 
 before doing sync, as [~stack] did. After some digging, I find out the case 
 where AsyncSyncer can encounter null-writer, it is as below:
 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
 from client writes to enter pendingWrites, and then waits for all items(= 
 200) in pendingWrites to append and finally sync to hdfs
 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync 
 =100 as a whole)
 5. t5: rollWriter now can close writer, set writer=null...
 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before 
 rollWriter sets writer to the newly rolled Writer
 We can see:
 1. the null writer is possible only after there are multiple AsyncSyncer 
 threads, that's why we never encountered it before introducing multiple 
 AsyncSyncer threads.
 2. since rollWriter can set writer=null only after all items of pendingWrites 
 sync to hdfs, and AsyncWriter is in the critical path of this task and there 
 is only one single AsyncWriter thread, so AsyncWriter can't encounter null 
 writer, that's why we never encounter null writer in AsyncWriter though it 
 also uses writer. This is the same reason as why null-writer never occurs 
 when there is a single AsyncSyncer thread.
 And we should treat differently when writer == null in AsyncSyncer:
 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer 
 care about have already been synced by other AsyncSyncer, we can safely 
 ignore sync(as [~stack] does here);
 2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
 txidToSync to avoid data loss: user gets successful write response but can't 
 read out the writes after getting the successful write response, from user's 
 perspective this is data loss (according to above analysis, such case should 
 not occur, but we still should add such defensive treatment to prevent data 
 loss if it really occurs, such as by some bug introduced later)
 also fix the bug where isSyncing needs to reset to false when writer.sync 
 encounters IOException: AsyncSyncer swallows such exception by failing all 
 writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
 later sync, its isSyncing needs to be reset to false in the IOException 
 handling block, otherwise it can't be selected by AsyncWriter to do sync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869838#comment-13869838
 ] 

stack commented on HBASE-10329:
---

Thanks [~fenghh].  Looks great.  Thanks for persisting and fixing my hack.

bq. and stack fixed this by adding 'if (writer != null)' to protect the 
sync operation

The check for nulll writer is actually an old 'problem' done in a few places 
about the code IIRC so kudos digging in.

Over the w/e I was working on my HBASE-10156.  Long story short, I ran into the 
same issue.  I need to hold the writer thread while the log is rolled out from 
under it only I can't hold the writer thread at any arbitrary point; I have to 
hold the writer when it attains the highest outstanding sync point.  Only then 
I can roll the log (patch coming soon).  Having this issue made me wonder how 
the current implementation does this dance.  This issue seems to indicate it 
didn't. 

Good on you.

 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Affects Versions: 0.98.0
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number client write threads, [~stack] encountered a NPE 
 while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
 Since we have run many times test in cluster to verify the throughput 
 improvement, and never encountered such NPE, it really confused me. (and 
 [~stack] fixed this by adding 'if (writer != null)' to protect the sync 
 operation)
 These days from time to time I wondered why the writer can be null in 
 AsyncSyncer and whether it's safe to fix it by just adding a null checking 
 before doing sync, as [~stack] did. After some digging, I find out the case 
 where AsyncSyncer can encounter null-writer, it is as below:
 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
 from client writes to enter pendingWrites, and then waits for all items(= 
 200) in pendingWrites to append and finally sync to hdfs
 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync 
 =100 as a whole)
 5. t5: rollWriter now can close writer, set writer=null...
 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before 
 rollWriter sets writer to the newly rolled Writer
 We can see:
 1. the null writer is possible only after there are multiple AsyncSyncer 
 threads, that's why we never encountered it before introducing multiple 
 AsyncSyncer threads.
 2. since rollWriter can set writer=null only after all items of pendingWrites 
 sync to hdfs, and AsyncWriter is in the critical path of this task and there 
 is only one single AsyncWriter thread, so AsyncWriter can't encounter null 
 writer, that's why we never encounter null writer in AsyncWriter though it 
 also uses writer. This is the same reason as why null-writer never occurs 
 when there is a single AsyncSyncer thread.
 And we should treat differently when writer == null in AsyncSyncer:
 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer 
 care about have already been synced by other AsyncSyncer, we can safely 
 ignore sync(as [~stack] does here);
 2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
 txidToSync to avoid data loss: user gets successful write response but can't 
 read out the writes after getting the successful write response, from user's 
 perspective this is data loss (according to above analysis, such case should 
 not occur, but we still should add such defensive treatment to prevent data 
 loss if it really occurs, such as by some bug introduced later)
 also fix the bug where isSyncing needs to reset to false when writer.sync 
 encounters IOException: AsyncSyncer swallows such exception by failing all 
 writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
 later sync, its isSyncing needs to be reset to false in the IOException 
 handling block, otherwise it can't be selected by AsyncWriter to do sync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10304) Running an hbase job jar: IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString

2014-01-13 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869835#comment-13869835
 ] 

Jimmy Xiang commented on HBASE-10304:
-

Makes sense. bin/hbase script doesn't accept command jar. It may need some 
tweak to work.

 Running an hbase job jar: IllegalAccessError: class 
 com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass 
 com.google.protobuf.LiteralByteString
 

 Key: HBASE-10304
 URL: https://issues.apache.org/jira/browse/HBASE-10304
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.98.0, 0.96.1.1
Reporter: stack
Priority: Blocker
 Fix For: 0.98.0

 Attachments: hbase-10304_not_tested.patch, jobjar.xml


 (Jimmy has been working on this one internally.  I'm just the messenger 
 raising this critical issue upstream).
 So, if you make job jar and bundle up hbase inside in it because you want to 
 access hbase from your mapreduce task, the deploy of the job jar to the 
 cluster fails with:
 {code}
 14/01/05 08:59:19 INFO Configuration.deprecation: 
 topology.node.switch.mapping.impl is deprecated. Instead, use 
 net.topology.node.switch.mapping.impl
 14/01/05 08:59:19 INFO Configuration.deprecation: io.bytes.per.checksum is 
 deprecated. Instead, use dfs.bytes-per-checksum
 Exception in thread main java.lang.IllegalAccessError: class 
 com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass 
 com.google.protobuf.LiteralByteString
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
   at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:818)
   at 
 org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertScanToString(TableMapReduceUtil.java:433)
   at 
 org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:186)
   at 
 org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:147)
   at 
 org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:270)
   at 
 org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:100)
   at 
 com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:124)
   at 
 com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.run(HBaseMapReduceIndexerTool.java:64)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at 
 com.ngdata.hbaseindexer.mr.HBaseMapReduceIndexerTool.main(HBaseMapReduceIndexerTool.java:51)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}
 So, ZCLBS is a hack.  This class is in the hbase-protocol module.  It is in 
 the com.google.protobuf package.  All is well and good usually.
 But when we make a job jar and bundle up hbase inside it, our 'trick' breaks. 
  RunJar makes a new class loader to run the job jar.  This URLCLassLoader 
 'attaches' all the jars and classes that are in jobjar so they can be found 
 when it does to do a lookup only Classloaders work by always delegating to 
 their parent first (unless you are a WAR file in a container where delegation 
 is 'off' for the most part) and in this case, the parent classloader will 
 have access to a pb jar since pb is in the hadoop CLASSPATH.  So, the parent 
 loads the pb classes.
 We then load ZCLBS only this is done in the claslsloader made by RunJar; 
 ZKCLBS has a different classloader from its superclass and we get the above 
 IllegalAccessError.
 Now (Jimmy's work comes in here), this can't be fixed by reflection -- you 
 can't setAccess on a 'Class' -- and though 

[jira] [Updated] (HBASE-10315) Canary shouldn't exit with 3 if there is no master running.

2014-01-13 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-10315:
--

Attachment: HBASE-10315-1.patch

Forgot to attach the newer patch.

 Canary shouldn't exit with 3 if there is no master running.
 ---

 Key: HBASE-10315
 URL: https://issues.apache.org/jira/browse/HBASE-10315
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-10315-0.patch, HBASE-10315-1.patch


 It's possible to timeout(when timeout is below the number of retires to 
 master) before even initializing if there is no master up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10315) Canary shouldn't exit with 3 if there is no master running.

2014-01-13 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-10315:
--

Attachment: HBASE-10315-0.patch

Here's a patch that exits with init error code if the canary doesn't initialize 
before the timeout.

 Canary shouldn't exit with 3 if there is no master running.
 ---

 Key: HBASE-10315
 URL: https://issues.apache.org/jira/browse/HBASE-10315
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.98.0, 0.96.1.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-10315-0.patch


 It's possible to timeout(when timeout is below the number of retires to 
 master) before even initializing if there is no master up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869854#comment-13869854
 ] 

Andrew Purtell commented on HBASE-10321:


Thanks for the fix Anoop!

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10331) Insure security tests use SecureTestUtil methods for grants

2014-01-13 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10331:
---

Status: Patch Available  (was: Open)

 Insure security tests use SecureTestUtil methods for grants
 ---

 Key: HBASE-10331
 URL: https://issues.apache.org/jira/browse/HBASE-10331
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10331.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10331) Insure security tests use SecureTestUtil methods for grants

2014-01-13 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10331:
---

Description: SecureTestUtil methods for grants and revokes wait for 
consistent AccessController state before proceeding, eliminating a source of 
race conditions in security unit tests.

 Insure security tests use SecureTestUtil methods for grants
 ---

 Key: HBASE-10331
 URL: https://issues.apache.org/jira/browse/HBASE-10331
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10331.patch


 SecureTestUtil methods for grants and revokes wait for consistent 
 AccessController state before proceeding, eliminating a source of race 
 conditions in security unit tests.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10331) Insure security tests use SecureTestUtil methods for grants

2014-01-13 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-10331:
---

Attachment: 10331.patch

Passes all o.a.h.h.security.*.* tests twice.

 Insure security tests use SecureTestUtil methods for grants
 ---

 Key: HBASE-10331
 URL: https://issues.apache.org/jira/browse/HBASE-10331
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10331.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-6581) Build with hadoop.profile=3.0

2014-01-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869875#comment-13869875
 ] 

Andrew Purtell commented on HBASE-6581:
---

Does this issue confuse issues with Hadoop 3.0 and JDK 8? 

bq. Some methods take 4 times more with the usage of the Method object. 
Strangely, TestHLog takes the same time - I will write a small blog post with 
more details.

Consider elaborating a bit here. 

 Build with hadoop.profile=3.0
 -

 Key: HBASE-6581
 URL: https://issues.apache.org/jira/browse/HBASE-6581
 Project: HBase
  Issue Type: Bug
Reporter: Eric Charles
Assignee: Eric Charles
 Fix For: 0.98.1

 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, 
 HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, 
 HBASE-6581-5.patch, HBASE-6581-6.patch, HBASE-6581-7.patch, HBASE-6581.diff, 
 HBASE-6581.diff


 Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to 
 change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT 
 instead of 3.0.0-SNAPSHOT in hbase-common).
 I can provide a patch that would move most of hadoop dependencies in their 
 respective profiles and will define the correct hadoop deps in the 3.0 
 profile.
 Please tell me if that's ok to go this way.
 Thx, Eric
 [1]
 $ mvn clean install -Dhadoop.profile=3.0
 [INFO] Scanning for projects...
 [ERROR] The build could not read 3 projects - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-server:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-server/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-common:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-common/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-it:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-it/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21
 [ERROR] 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10315) Canary shouldn't exit with 3 if there is no master running.

2014-01-13 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-10315:
--

Status: Patch Available  (was: Open)

 Canary shouldn't exit with 3 if there is no master running.
 ---

 Key: HBASE-10315
 URL: https://issues.apache.org/jira/browse/HBASE-10315
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.96.1.1, 0.98.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-10315-0.patch, HBASE-10315-1.patch


 It's possible to timeout(when timeout is below the number of retires to 
 master) before even initializing if there is no master up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10123) Change default ports; move them out of linux ephemeral port range

2014-01-13 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869893#comment-13869893
 ] 

Jonathan Hsieh commented on HBASE-10123:


tl;dr Based on stack's link I'm going to move 60xxx ports to 16xxx ports. 

Stack's link basically states the comment ephemeral port ranges:
BSD - 1-1023 reserved. 1024-4999 ephemeral. Others feel 49152-65535 are 
ephemeral
AIX - 32768-65535 ephemeral.
HPUX - 49152-65535 ephemeral.
Linux 2.2 - 1024-4999 ephemeral.
Linux 2.4 - 32768-61000 ephmeral.
openBSD - 32786-49151 or 49152-65535 ephemeral
solaris - 32768-65535 ephemeral.
tru64 unix - 1024-4999 ephemeral.
windows 2k8- 49152-65535

basically means we are safe anywhere between 5000-32768.

Looking at my /etc/services (ubuntu 10.04], big blocks that seem untouched 
include
12xxx, 14xxx, 16xxx, 18-19xxx, 21xxx, 23xxx, 26xxx, 28xxx-32768

 Change default ports; move them out of linux ephemeral port range
 -

 Key: HBASE-10123
 URL: https://issues.apache.org/jira/browse/HBASE-10123
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.1.1
Reporter: stack
Priority: Critical
 Fix For: 0.98.0


 Our defaults clash w/ the range linux assigns itself for creating come-and-go 
 ephemeral ports; likely in our history we've clashed w/ a random, short-lived 
 process.  While easy to change the defaults, we should just ship w/ defaults 
 that make sense.  We could host ourselves up into the 7 or 8k range.
 See http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion

2014-01-13 Thread Samir Ahmic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869915#comment-13869915
 ] 

Samir Ahmic commented on HBASE-7386:


Thanks for review [~nkeywal]. 
I agree about 'PROCESS_STATE_UNKNOWN',  i checked it in supervisor source code 
and it is look like that  is used for actions if supervisor is unable to 
determine state of process. I will remove it from event listener since it can 
cause issues. 

I was planing to make mail notification optional even to create separate event 
listener that will handle email notifications. '/bin/mail' is most simple 
solution and following that example folks could develop there own solution. 
What do you think how this should be handled ?

bq. Do we have to use python?
According to documentation: Event listener can be written in any language 
supported by the platform you’re using to run supervisor. There is special 
library support for Python in the form of a supervisor.childutils module, which 
makes creating event listeners in Python slightly easier than in other 
languages. Any suggestions what should we use instead of python ? Java ?
When we complete this work it should be documented probably under 15. Apache 
HBase Operational Management ?






 Investigate providing some supervisor support for znode deletion
 

 Key: HBASE-7386
 URL: https://issues.apache.org/jira/browse/HBASE-7386
 Project: HBase
  Issue Type: Task
  Components: master, regionserver, scripts
Reporter: Gregory Chanan
Assignee: stack
Priority: Blocker
 Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
 HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
 HBASE-7386-v0.patch, supervisordconfigs-v0.patch


 There a couple of JIRAs for deleting the znode on a process failure:
 HBASE-5844 (RS)
 HBASE-5926 (Master)
 which are pretty neat; on process failure, they delete the znode of the 
 underlying process so HBase can recover faster.
 These JIRAs were implemented via the startup scripts; i.e. the script hangs 
 around and waits for the process to exit, then deletes the znode.
 There are a few problems associated with this approach, as listed in the 
 below JIRAs:
 1) Hides startup output in script
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 2) two hbase processes listed per launched daemon
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 3) Not run by a real supervisor
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
 4) Weird output after kill -9 actual process in standalone mode
 https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
 5) Can kill existing RS if called again
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
 6) Hides stdout/stderr[6]
 https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
 I suspect running in via something like supervisor.d can solve these issues 
 if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869917#comment-13869917
 ] 

Hudson commented on HBASE-10321:


SUCCESS: Integrated in HBase-0.98 #73 (See 
[https://builds.apache.org/job/HBase-0.98/73/])
HBASE-10321 CellCodec has broken the 96 client to 98 server compatibility 
(anoopsamjohn: rev 1557780)
* 
/hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodec.java
* 
/hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodecV2.java
* 
/hbase/branches/0.98/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodec.java
* 
/hbase/branches/0.98/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodecV2.java


 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869919#comment-13869919
 ] 

Hudson commented on HBASE-10321:


SUCCESS: Integrated in HBase-TRUNK #4810 (See 
[https://builds.apache.org/job/HBase-TRUNK/4810/])
HBASE-10321 CellCodec has broken the 96 client to 98 server compatibility 
(anoopsamjohn: rev 1557781)
* 
/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodec.java
* 
/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodecV2.java
* 
/hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodec.java
* 
/hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodecV2.java


 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869920#comment-13869920
 ] 

Hudson commented on HBASE-10326:


SUCCESS: Integrated in HBase-TRUNK #4810 (See 
[https://builds.apache.org/job/HBase-TRUNK/4810/])
HBASE-10326 Super user should be able scan all the cells irrespective of the 
visibility labels(Ram) (anoopsamjohn: rev 1557792)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithACL.java


 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869918#comment-13869918
 ] 

Hudson commented on HBASE-10326:


SUCCESS: Integrated in HBase-0.98 #73 (See 
[https://builds.apache.org/job/HBase-0.98/73/])
HBASE-10326 Super user should be able scan all the cells irrespective of the 
visibility labels(Ram) (anoopsamjohn: rev 1557791)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithACL.java


 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HBASE-10332) Missing .regioninfo file during daughter open processing

2014-01-13 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi reassigned HBASE-10332:
---

Assignee: Matteo Bertozzi

 Missing .regioninfo file during daughter open processing
 

 Key: HBASE-10332
 URL: https://issues.apache.org/jira/browse/HBASE-10332
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Purtell
Assignee: Matteo Bertozzi

 Under cluster stress testing, there are a fair amount of warnings like this:
 {noformat}
 2014-01-12 04:52:29,183 WARN  
 [test-1,8120,1389467616661-daughterOpener=490a58c14b14a59e8d303d310684f0b0] 
 regionserver.HRegionFileSystem: .regioninfo file not found for region: 
 490a58c14b14a59e8d303d310684f0b0
 {noformat}
 This is from HRegionFileSystem#checkRegionInfoOnFilesystem, which catches a 
 FileNotFoundException in this case and calls writeRegionInfoOnFilesystem to 
 fix up the issue.
 Is this a bug in splitting?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10332) Missing .regioninfo file during daughter open processing

2014-01-13 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869935#comment-13869935
 ] 

Matteo Bertozzi commented on HBASE-10332:
-

HRegion.createDaughterRegionFromSplits() uses HRegion.newHRegion() instead of 
createHRegion() so the .regioninfo file is not created on daughter creation but 
on daughter open by the checkRegionInfoOnFileSystem..
This shouldn't be a problem, but let me see if I can change the code to use the 
create, or at least write the .regioninfo on creation

 Missing .regioninfo file during daughter open processing
 

 Key: HBASE-10332
 URL: https://issues.apache.org/jira/browse/HBASE-10332
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Purtell

 Under cluster stress testing, there are a fair amount of warnings like this:
 {noformat}
 2014-01-12 04:52:29,183 WARN  
 [test-1,8120,1389467616661-daughterOpener=490a58c14b14a59e8d303d310684f0b0] 
 regionserver.HRegionFileSystem: .regioninfo file not found for region: 
 490a58c14b14a59e8d303d310684f0b0
 {noformat}
 This is from HRegionFileSystem#checkRegionInfoOnFilesystem, which catches a 
 FileNotFoundException in this case and calls writeRegionInfoOnFilesystem to 
 fix up the issue.
 Is this a bug in splitting?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869943#comment-13869943
 ] 

Ted Yu commented on HBASE-10329:


Integrated to trunk.

Patch for 0.98 coming.

 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Affects Versions: 0.98.0
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number client write threads, [~stack] encountered a NPE 
 while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
 Since we have run many times test in cluster to verify the throughput 
 improvement, and never encountered such NPE, it really confused me. (and 
 [~stack] fixed this by adding 'if (writer != null)' to protect the sync 
 operation)
 These days from time to time I wondered why the writer can be null in 
 AsyncSyncer and whether it's safe to fix it by just adding a null checking 
 before doing sync, as [~stack] did. After some digging, I find out the case 
 where AsyncSyncer can encounter null-writer, it is as below:
 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
 from client writes to enter pendingWrites, and then waits for all items(= 
 200) in pendingWrites to append and finally sync to hdfs
 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync 
 =100 as a whole)
 5. t5: rollWriter now can close writer, set writer=null...
 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before 
 rollWriter sets writer to the newly rolled Writer
 We can see:
 1. the null writer is possible only after there are multiple AsyncSyncer 
 threads, that's why we never encountered it before introducing multiple 
 AsyncSyncer threads.
 2. since rollWriter can set writer=null only after all items of pendingWrites 
 sync to hdfs, and AsyncWriter is in the critical path of this task and there 
 is only one single AsyncWriter thread, so AsyncWriter can't encounter null 
 writer, that's why we never encounter null writer in AsyncWriter though it 
 also uses writer. This is the same reason as why null-writer never occurs 
 when there is a single AsyncSyncer thread.
 And we should treat differently when writer == null in AsyncSyncer:
 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer 
 care about have already been synced by other AsyncSyncer, we can safely 
 ignore sync(as [~stack] does here);
 2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
 txidToSync to avoid data loss: user gets successful write response but can't 
 read out the writes after getting the successful write response, from user's 
 perspective this is data loss (according to above analysis, such case should 
 not occur, but we still should add such defensive treatment to prevent data 
 loss if it really occurs, such as by some bug introduced later)
 also fix the bug where isSyncing needs to reset to false when writer.sync 
 encounters IOException: AsyncSyncer swallows such exception by failing all 
 writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
 later sync, its isSyncing needs to be reset to false in the IOException 
 handling block, otherwise it can't be selected by AsyncWriter to do sync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

2014-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869944#comment-13869944
 ] 

Hudson commented on HBASE-10321:


SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #68 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/68/])
HBASE-10321 CellCodec has broken the 96 client to 98 server compatibility 
(anoopsamjohn: rev 1557780)
* 
/hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodec.java
* 
/hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodecV2.java
* 
/hbase/branches/0.98/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodec.java
* 
/hbase/branches/0.98/hbase-common/src/test/java/org/apache/hadoop/hbase/codec/TestCellCodecV2.java


 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch, HBASE-10321_V2.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10326) Super user should be able scan all the cells irrespective of the visibility labels

2014-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869945#comment-13869945
 ] 

Hudson commented on HBASE-10326:


SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #68 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/68/])
HBASE-10326 Super user should be able scan all the cells irrespective of the 
visibility labels(Ram) (anoopsamjohn: rev 1557791)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityController.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithACL.java


 Super user should be able scan all the cells irrespective of the visibility 
 labels
 --

 Key: HBASE-10326
 URL: https://issues.apache.org/jira/browse/HBASE-10326
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.98.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
  Labels: security
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10326.patch, HBASE-10326_1.patch


 This issue is in lieu with HBASE-10322.  In case of export tool, when the 
 cells with visibility labels are exported using a super user we should be 
 able to export the data.  But with the current implementation, the super user 
 would also be able to view cells that has visibility labels associated with 
 the superuser.  The idea of HBASE-10322 is to strip out tags based on user 
 and if so this change is necessary for export tool to work with Visibility.  
 ACL already has a concept of global admins.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10329:
---

Status: Open  (was: Patch Available)

 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Affects Versions: 0.98.0
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number client write threads, [~stack] encountered a NPE 
 while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
 Since we have run many times test in cluster to verify the throughput 
 improvement, and never encountered such NPE, it really confused me. (and 
 [~stack] fixed this by adding 'if (writer != null)' to protect the sync 
 operation)
 These days from time to time I wondered why the writer can be null in 
 AsyncSyncer and whether it's safe to fix it by just adding a null checking 
 before doing sync, as [~stack] did. After some digging, I find out the case 
 where AsyncSyncer can encounter null-writer, it is as below:
 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
 from client writes to enter pendingWrites, and then waits for all items(= 
 200) in pendingWrites to append and finally sync to hdfs
 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync 
 =100 as a whole)
 5. t5: rollWriter now can close writer, set writer=null...
 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before 
 rollWriter sets writer to the newly rolled Writer
 We can see:
 1. the null writer is possible only after there are multiple AsyncSyncer 
 threads, that's why we never encountered it before introducing multiple 
 AsyncSyncer threads.
 2. since rollWriter can set writer=null only after all items of pendingWrites 
 sync to hdfs, and AsyncWriter is in the critical path of this task and there 
 is only one single AsyncWriter thread, so AsyncWriter can't encounter null 
 writer, that's why we never encounter null writer in AsyncWriter though it 
 also uses writer. This is the same reason as why null-writer never occurs 
 when there is a single AsyncSyncer thread.
 And we should treat differently when writer == null in AsyncSyncer:
 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer 
 care about have already been synced by other AsyncSyncer, we can safely 
 ignore sync(as [~stack] does here);
 2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
 txidToSync to avoid data loss: user gets successful write response but can't 
 read out the writes after getting the successful write response, from user's 
 perspective this is data loss (according to above analysis, such case should 
 not occur, but we still should add such defensive treatment to prevent data 
 loss if it really occurs, such as by some bug introduced later)
 also fix the bug where isSyncing needs to reset to false when writer.sync 
 encounters IOException: AsyncSyncer swallows such exception by failing all 
 writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
 later sync, its isSyncing needs to be reset to false in the IOException 
 handling block, otherwise it can't be selected by AsyncWriter to do sync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10331) Insure security tests use SecureTestUtil methods for grants

2014-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869975#comment-13869975
 ] 

Hadoop QA commented on HBASE-10331:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622681/10331.patch
  against trunk revision .
  ATTACHMENT ID: 12622681

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestHBaseFsck

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8405//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8405//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8405//console

This message is automatically generated.

 Insure security tests use SecureTestUtil methods for grants
 ---

 Key: HBASE-10331
 URL: https://issues.apache.org/jira/browse/HBASE-10331
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10331.patch


 SecureTestUtil methods for grants and revokes wait for consistent 
 AccessController state before proceeding, eliminating a source of race 
 conditions in security unit tests.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869971#comment-13869971
 ] 

Ted Yu commented on HBASE-10329:


Integrated to 0.98 as well.

Thanks for the patch, Honghua.

Will resolve this after seeing green builds.

 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Affects Versions: 0.98.0
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10329-0.98.txt, HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number client write threads, [~stack] encountered a NPE 
 while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
 Since we have run many times test in cluster to verify the throughput 
 improvement, and never encountered such NPE, it really confused me. (and 
 [~stack] fixed this by adding 'if (writer != null)' to protect the sync 
 operation)
 These days from time to time I wondered why the writer can be null in 
 AsyncSyncer and whether it's safe to fix it by just adding a null checking 
 before doing sync, as [~stack] did. After some digging, I find out the case 
 where AsyncSyncer can encounter null-writer, it is as below:
 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
 from client writes to enter pendingWrites, and then waits for all items(= 
 200) in pendingWrites to append and finally sync to hdfs
 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync 
 =100 as a whole)
 5. t5: rollWriter now can close writer, set writer=null...
 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before 
 rollWriter sets writer to the newly rolled Writer
 We can see:
 1. the null writer is possible only after there are multiple AsyncSyncer 
 threads, that's why we never encountered it before introducing multiple 
 AsyncSyncer threads.
 2. since rollWriter can set writer=null only after all items of pendingWrites 
 sync to hdfs, and AsyncWriter is in the critical path of this task and there 
 is only one single AsyncWriter thread, so AsyncWriter can't encounter null 
 writer, that's why we never encounter null writer in AsyncWriter though it 
 also uses writer. This is the same reason as why null-writer never occurs 
 when there is a single AsyncSyncer thread.
 And we should treat differently when writer == null in AsyncSyncer:
 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer 
 care about have already been synced by other AsyncSyncer, we can safely 
 ignore sync(as [~stack] does here);
 2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
 txidToSync to avoid data loss: user gets successful write response but can't 
 read out the writes after getting the successful write response, from user's 
 perspective this is data loss (according to above analysis, such case should 
 not occur, but we still should add such defensive treatment to prevent data 
 loss if it really occurs, such as by some bug introduced later)
 also fix the bug where isSyncing needs to reset to false when writer.sync 
 encounters IOException: AsyncSyncer swallows such exception by failing all 
 writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
 later sync, its isSyncing needs to be reset to false in the IOException 
 handling block, otherwise it can't be selected by AsyncWriter to do sync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10329) Fail the writes rather than proceeding silently to prevent data loss when AsyncSyncer encounters null writer and its writes aren't synced by other Asyncer

2014-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-10329:
---

Attachment: 10329-0.98.txt

 Fail the writes rather than proceeding silently to prevent data loss when 
 AsyncSyncer encounters null writer and its writes aren't synced by other 
 Asyncer
 --

 Key: HBASE-10329
 URL: https://issues.apache.org/jira/browse/HBASE-10329
 Project: HBase
  Issue Type: Bug
  Components: regionserver, wal
Affects Versions: 0.98.0
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10329-0.98.txt, HBASE-10329-trunk_v0.patch


 Last month after I introduced multiple AsyncSyncer threads to improve the 
 throughput for lower number client write threads, [~stack] encountered a NPE 
 while doing the test where null-writer occurs in AsyncSyncer when doing sync. 
 Since we have run many times test in cluster to verify the throughput 
 improvement, and never encountered such NPE, it really confused me. (and 
 [~stack] fixed this by adding 'if (writer != null)' to protect the sync 
 operation)
 These days from time to time I wondered why the writer can be null in 
 AsyncSyncer and whether it's safe to fix it by just adding a null checking 
 before doing sync, as [~stack] did. After some digging, I find out the case 
 where AsyncSyncer can encounter null-writer, it is as below:
 1. t1: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 1 with 
 writtenTxid==100
 2. t2: AsyncWriter appends writes to hdfs, triggers AsyncSyncer 2 with 
 writtenTxid==200
 3. t3: rollWriter starts, it grabs the updateLock to prevents further writes 
 from client writes to enter pendingWrites, and then waits for all items(= 
 200) in pendingWrites to append and finally sync to hdfs
 4. t4: AsyncSyncer 2 finishes, now syncedTillHere==200(it also help sync 
 =100 as a whole)
 5. t5: rollWriter now can close writer, set writer=null...
 6. t6: AsyncSyncer 1 starts to do sync and finds the writer is null... before 
 rollWriter sets writer to the newly rolled Writer
 We can see:
 1. the null writer is possible only after there are multiple AsyncSyncer 
 threads, that's why we never encountered it before introducing multiple 
 AsyncSyncer threads.
 2. since rollWriter can set writer=null only after all items of pendingWrites 
 sync to hdfs, and AsyncWriter is in the critical path of this task and there 
 is only one single AsyncWriter thread, so AsyncWriter can't encounter null 
 writer, that's why we never encounter null writer in AsyncWriter though it 
 also uses writer. This is the same reason as why null-writer never occurs 
 when there is a single AsyncSyncer thread.
 And we should treat differently when writer == null in AsyncSyncer:
 1. if txidToSync = syncedTillHere, this means all writes this AsyncSyncer 
 care about have already been synced by other AsyncSyncer, we can safely 
 ignore sync(as [~stack] does here);
 2. if txidToSync  syncedTillHere, we need fail all the writes with txid = 
 txidToSync to avoid data loss: user gets successful write response but can't 
 read out the writes after getting the successful write response, from user's 
 perspective this is data loss (according to above analysis, such case should 
 not occur, but we still should add such defensive treatment to prevent data 
 loss if it really occurs, such as by some bug introduced later)
 also fix the bug where isSyncing needs to reset to false when writer.sync 
 encounters IOException: AsyncSyncer swallows such exception by failing all 
 writes with txid=txidToSync, and this AsyncSyncer thread is now ready to do 
 later sync, its isSyncing needs to be reset to false in the IOException 
 handling block, otherwise it can't be selected by AsyncWriter to do sync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   3   >