[jira] Created: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library
Hedwig c++ client uses a non thread safe logging library Key: ZOOKEEPER-930 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Affects Versions: 3.3.2 Reporter: Ivan Kelly Assignee: Ivan Kelly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library
[ https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932021#action_12932021 ] Ivan Kelly commented on ZOOKEEPER-930: -- Moved from log4cpp to log4cxx. Hedwig c++ client uses a non thread safe logging library Key: ZOOKEEPER-930 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Affects Versions: 3.3.2 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: ZOOKEEPER-930.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library
[ https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated ZOOKEEPER-930: - Attachment: ZOOKEEPER-930.patch Hedwig c++ client uses a non thread safe logging library Key: ZOOKEEPER-930 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Affects Versions: 3.3.2 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: ZOOKEEPER-930.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library
[ https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated ZOOKEEPER-930: - Status: Patch Available (was: Open) Hedwig c++ client uses a non thread safe logging library Key: ZOOKEEPER-930 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Affects Versions: 3.3.2 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: ZOOKEEPER-930.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932118#action_12932118 ] Patrick Hunt commented on ZOOKEEPER-900: I'd appreciate if you could fix the findbugs, that would be great. See also ZOOKEEPER-902 -- as part of the fix set the findbugs acceptable back to 0. Thanks! FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library
[ https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932163#action_12932163 ] Benjamin Reed commented on ZOOKEEPER-930: - looks good ivan. you should probably mention that you are moving to log4cxx for thread safety issues. the one minor thing: you messed up the indentation on a couple of lines. can you fix those? Hedwig c++ client uses a non thread safe logging library Key: ZOOKEEPER-930 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Affects Versions: 3.3.2 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: ZOOKEEPER-930.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Watcher examples
It would be great to have more examples as part of the release artifact. Would you mind creating a JIRA/patch for this? http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute I'm thinking that we could have a src/contrib/examples or src/examples ... what do you guys think? (mahadev?) Patrick On Thu, Nov 11, 2010 at 12:46 PM, Robert Crocombe rcroc...@gmail.com wrote: On Tue, Nov 9, 2010 at 12:34 PM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote: Anyone know of a good blog post or docs anywhere that gives a simple example of Watchers in action? I saw the one on: http://hadoop.apache.org/zookeeper/docs/current/javaExample.html#ch_Introduction but it seems kind of overly complicated for an intro to Watchers. I appreciate the example but wondered if there were other examples out there. Appended is a Java example of using a Watcher simply to wait for the client to actually be connected to a server. I used it when I was confirming to my satisfaction that there was a bug in the ZooKeeper recipe for WriteLock awhile ago. I think this use is slightly unusual in that it is more interested in KeeperState than the event type. A more conventional Watcher might be like the following sketch (uhm, this is Groovy), though really you'd have to look at both: @Override public void process(WatchedEvent event) { switch (event?.getType()) { case EventType.NodeDeleted: // TODO: what should we do if the node being watched is itself // deleted? LOG.error(The node being watched ' + event.getPath + ' has been deleted: that's not good) break case EventType.NodeChildrenChanged: childrenChanged(event) break default: LOG.debug(Ignoring event type ' + event?.getType() + ') break } } -- Robert Crocombe package derp; import java.io.IOException; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.locks.Condition; import java.util.concurrent.locks.Lock; import java.util.concurrent.locks.ReentrantLock; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.zookeeper.KeeperException; import org.apache.zookeeper.WatchedEvent; import org.apache.zookeeper.Watcher; import org.apache.zookeeper.ZooKeeper; import org.apache.zookeeper.Watcher.Event.KeeperState; import org.apache.zookeeper.recipes.lock.WriteLock; public class Test { private static final Log LOG = LogFactory.getLog(Test.class); private static final String ZOO_CONFIG = 10.2.1.54:2181/test; private static final String LOCK_DIR = /locking-test; private static final int TIMEOUT_MILLIS = 1; private static class ConnectWatcher implements Watcher { private final Lock connectedLock = new ReentrantLock(); private final Condition connectedCondition = connectedLock.newCondition(); private final AtomicBoolean connected = new AtomicBoolean(false); @Override public void process(WatchedEvent event) { LOG.debug(Event: + event); KeeperState keeperState = event.getState(); switch (keeperState) { case SyncConnected: if (!connected.get()) { connected.set(true); signal(); } break; case Expired: case Disconnected: if (connected.get()) { connected.set(false); signal(); } } } public void waitForConnection() throws InterruptedException { connectedLock.lock(); try { while (!connected.get()) { LOG.debug(Waiting for condition to be signalled); connectedCondition.await(); LOG.debug(Woken up on condition signalled); } } finally { connectedLock.unlock(); } LOG.debug(After signalling, we are connected); } @Override public String toString() { StringBuilder b = new StringBuilder([); b.append(connectedLock:).append(connectedLock); b.append(,connectedCondition:).append(connectedCondition); b.append(,connected:).append(connected); b.append(]); return b.toString(); } private void signal() { LOG.debug(Signaling after event); connectedLock.lock(); try { connectedCondition.signal(); } finally { connectedLock.unlock(); } } } private static final void fine(ZooKeeper lowerId, ZooKeeper higherId) throws KeeperException, InterruptedException { WriteLock lower = new WriteLock(lowerId, LOCK_DIR, null); WriteLock higher = new WriteLock(higherId, LOCK_DIR, null); boolean lowerAcquired = lower.lock(); assert lowerAcquired; LOG.debug(Lower acquired lock successfully, so higher should fail); boolean higherAcquired = higher.lock(); assert !higherAcquired; LOG.debug(Correct: higher session fails to acquire lock); lower.unlock(); // Now that lower has unlocked, higher will acquire. Really should use // the version of WriteLock with the LockListener, but a short sleep // should do. Thread.sleep(2000); higher.unlock(); // make sure we let go. assert !higher.isOwner(); } /* * Using recipes from ZooKeeper 3.2.1. * * This bug occurs because the sort in ZooKeeperLockOperation.execute * (beginning @ line 221) orders the paths, but the paths contain the * session ID (lines 206-207), so that sorting
[jira] Commented: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library
[ https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932196#action_12932196 ] Hadoop QA commented on ZOOKEEPER-930: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12459592/ZOOKEEPER-930.patch against trunk revision 1034003. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 24 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/29//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/29//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/29//console This message is automatically generated. Hedwig c++ client uses a non thread safe logging library Key: ZOOKEEPER-930 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Affects Versions: 3.3.2 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: ZOOKEEPER-930.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-930) Hedwig c++ client uses a non thread safe logging library
[ https://issues.apache.org/jira/browse/ZOOKEEPER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932205#action_12932205 ] Michi Mutsuzaki commented on ZOOKEEPER-930: --- +1. Thank you Ivan for the quick patch! --Michi Hedwig c++ client uses a non thread safe logging library Key: ZOOKEEPER-930 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-930 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Affects Versions: 3.3.2 Reporter: Ivan Kelly Assignee: Ivan Kelly Attachments: ZOOKEEPER-930.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932220#action_12932220 ] Vishal K commented on ZOOKEEPER-900: Hi Flavio, Pat, I will submit a new patch with suggested changes. For 902 do I just add a file ./java/test/bin/test-patch.properties with OK_FINDBUGS_WARNINGS=0 in it? FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932228#action_12932228 ] Flavio Junqueira commented on ZOOKEEPER-900: If we fix the findbugs issue here, then we should just close ZOOKEEPER-902 stating that it was resolved in ZOOKEEPER-900. FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Watcher examples
Yup. This would be really nice to have. More examples and documentaiton for them, would be really helpful for our users. Thanks mahadev On 11/15/10 12:54 PM, Patrick Hunt ph...@apache.org wrote: It would be great to have more examples as part of the release artifact. Would you mind creating a JIRA/patch for this? http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute I'm thinking that we could have a src/contrib/examples or src/examples ... what do you guys think? (mahadev?) Patrick On Thu, Nov 11, 2010 at 12:46 PM, Robert Crocombe rcroc...@gmail.com wrote: On Tue, Nov 9, 2010 at 12:34 PM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote: Anyone know of a good blog post or docs anywhere that gives a simple example of Watchers in action? I saw the one on: http://hadoop.apache.org/zookeeper/docs/current/javaExample.html#ch_Introduc tion but it seems kind of overly complicated for an intro to Watchers. I appreciate the example but wondered if there were other examples out there. Appended is a Java example of using a Watcher simply to wait for the client to actually be connected to a server. I used it when I was confirming to my satisfaction that there was a bug in the ZooKeeper recipe for WriteLock awhile ago. I think this use is slightly unusual in that it is more interested in KeeperState than the event type. A more conventional Watcher might be like the following sketch (uhm, this is Groovy), though really you'd have to look at both: @Override public void process(WatchedEvent event) { switch (event?.getType()) { case EventType.NodeDeleted: // TODO: what should we do if the node being watched is itself // deleted? LOG.error(The node being watched ' + event.getPath + ' has been deleted: that's not good) break case EventType.NodeChildrenChanged: childrenChanged(event) break default: LOG.debug(Ignoring event type ' + event?.getType() + ') break } } -- Robert Crocombe package derp; import java.io.IOException; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.locks.Condition; import java.util.concurrent.locks.Lock; import java.util.concurrent.locks.ReentrantLock; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.zookeeper.KeeperException; import org.apache.zookeeper.WatchedEvent; import org.apache.zookeeper.Watcher; import org.apache.zookeeper.ZooKeeper; import org.apache.zookeeper.Watcher.Event.KeeperState; import org.apache.zookeeper.recipes.lock.WriteLock; public class Test { private static final Log LOG = LogFactory.getLog(Test.class); private static final String ZOO_CONFIG = 10.2.1.54:2181/test; private static final String LOCK_DIR = /locking-test; private static final int TIMEOUT_MILLIS = 1; private static class ConnectWatcher implements Watcher { private final Lock connectedLock = new ReentrantLock(); private final Condition connectedCondition = connectedLock.newCondition(); private final AtomicBoolean connected = new AtomicBoolean(false); @Override public void process(WatchedEvent event) { LOG.debug(Event: + event); KeeperState keeperState = event.getState(); switch (keeperState) { case SyncConnected: if (!connected.get()) { connected.set(true); signal(); } break; case Expired: case Disconnected: if (connected.get()) { connected.set(false); signal(); } } } public void waitForConnection() throws InterruptedException { connectedLock.lock(); try { while (!connected.get()) { LOG.debug(Waiting for condition to be signalled); connectedCondition.await(); LOG.debug(Woken up on condition signalled); } } finally { connectedLock.unlock(); } LOG.debug(After signalling, we are connected); } @Override public String toString() { StringBuilder b = new StringBuilder([); b.append(connectedLock:).append(connectedLock); b.append(,connectedCondition:).append(connectedCondition); b.append(,connected:).append(connected); b.append(]); return b.toString(); } private void signal() { LOG.debug(Signaling after event); connectedLock.lock(); try { connectedCondition.signal(); } finally { connectedLock.unlock(); } } } private static final void fine(ZooKeeper lowerId, ZooKeeper higherId) throws KeeperException, InterruptedException { WriteLock lower = new WriteLock(lowerId, LOCK_DIR, null); WriteLock higher = new WriteLock(higherId, LOCK_DIR, null); boolean lowerAcquired = lower.lock(); assert lowerAcquired; LOG.debug(Lower acquired lock successfully, so higher should fail); boolean higherAcquired = higher.lock(); assert !higherAcquired; LOG.debug(Correct: higher session fails to acquire lock); lower.unlock(); // Now that lower has unlocked, higher will acquire. Really should use // the version of WriteLock with the LockListener, but a short sleep // should do. Thread.sleep(2000); higher.unlock(); // make sure we let go. assert !higher.isOwner(); } /* * Using recipes from