[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

2010-11-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934123#action_12934123
 ] 

Hudson commented on ZOOKEEPER-900:
--

Integrated in ZooKeeper-trunk #1010 (See 
[https://hudson.apache.org/hudson/job/ZooKeeper-trunk/1010/])


 FLE implementation should be improved to use non-blocking sockets
 -

 Key: ZOOKEEPER-900
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
 Project: Zookeeper
  Issue Type: Bug
Reporter: Vishal K
Assignee: Vishal K
Priority: Critical
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, 
 ZOOKEEPER-900.patch2


 From earlier email exchanges:
 1. Blocking connects and accepts:
 a) The first problem is in manager.toSend(). This invokes connectOne(), which 
 does a blocking connect. While testing, I changed the code so that 
 connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
 does a socketChannel.connect(). After starting AsyncConnect, connectOne 
 starts a timer. connectOne continues with normal operations if the connection 
 is established before the timer expires, otherwise, when the timer expires it 
 interrupts AsyncConnect() thread and returns. In this way, I can have an 
 upper bound on the amount of time we need to wait for connect to succeed. Of 
 course, this was a quick fix for my testing. Ideally, we should use Selector 
 to do non-blocking connects/accepts. I am planning to do that later once we 
 at least have a quick fix for the problem and consensus from others for the 
 real fix (this problem is big blocker for us). Note that it is OK to do 
 blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
 respective !
 peer.
 b) The blocking IO problem is not just restricted to connectOne(), but also 
 in receiveConnection(). The Listener thread calls receiveConnection() for 
 each incoming connection request. receiveConnection does blocking IO to get 
 peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
 peer that had sent the connection request. All of this is happening from the 
 Listener. In short, if a peer fails after initiating a connection, the 
 Listener thread won't be able to accept connections from other peers, because 
 it would be stuck in read() or connetOne(). Also the code has an inherent 
 cycle. initiateConnection() and receiveConnection() will have to be very 
 carefully synchronized otherwise, we could run into deadlocks. This code is 
 going to be difficult to maintain/modify.
 Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.

2010-11-20 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934202#action_12934202
 ] 

Flavio Junqueira commented on ZOOKEEPER-335:


Radu, It sounds like the problem you mention has been resolved in 
ZOOKEEPER-790. I'm not sure which version you're using, but perhaps you should 
consider moving to 3.3.2.

 zookeeper servers should commit the new leader txn to their logs.
 -

 Key: ZOOKEEPER-335
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.1.0
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 3.4.0

 Attachments: faultynode-vishal.txt, zk.log.gz, zklogs.tar.gz, 
 ZOOKEEPER-790.travis.log.bz2


 currently the zookeeper followers do not commit the new leader election. This 
 will cause problems in a failure scenarios with a follower acking to the same 
 leader txn id twice, which might be two different intermittent leaders and 
 allowing them to propose two different txn's of the same zxid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.