I got an out-of-office auto-reply from Chris, so I went ahead and filed the jira on his behalf.
https://issues.apache.org/jira/browse/ZOOKEEPER-2242 --Chris Nauroth On 8/6/15, 10:10 AM, "Chris Nauroth" <[email protected]> wrote: >Hi Chris, > >I'm glad to hear this worked out! > >Regarding the remaining bug related to OSGi, would you please file an >Apache jira to track that? Even better, if you're available to code a >patch for it, the community would appreciate the contribution. It sounds >like you're a heavy user of OSGi, so you likely have a good test >environment for validating a patch. > >The OSGi manifest headers are coded in the build.xml. You can find them >by searching for "Import-Package". > >Thanks again! > >--Chris Nauroth > > > > >On 7/31/15, 5:18 PM, "Chris Barlock" <[email protected]> wrote: > >>I found the problem -- there were several missing package imports in the >>MANIFEST.MF that I created for the OSGi bundle I made that wrapped all >>the >>Kafka/ZooKeeper JAR files. It took enabling the log4j logging for >> >>log4j.logger.org.apache.zookeeper=DEBUG >> >>in order to see the ClassNotFoundExceptions that are typical with this >>problem. It was odd to me because my application's logging has always >>uncovered these problems, but... >> >>However, I did uncover a ZooKeeper bug. After fixing this, I tried >>"cleaning up" my OSGi bundle by pulling out all the JARs that are already >>OSGi bundles on their own. zookeeper-3.4.6 was one of these. However, >>it >>is missing a package import for >> >>org.ietf.jgss >> >>which is part of the Java SDK. I added this to my ZK JAR and it resolved >>the problem. >> >>Chris >> >> >> >> >>From: Chris Barlock/Raleigh/IBM@IBMUS >>To: [email protected] >>Date: 07/27/2015 05:09 PM >>Subject: Re: ZooKeeper Class Will Not Connect >> >> >> >>OK, I'm convinced that ZK is not broken. I stripped my code down to a >>simple stand-alone test case and it works just fine, even with the sleep >>loop. But, when I run it normally ZK 3.4.6, I don't see the ZK server >>logging anything at all. With 3.4.2, it is fine, as previously noted. >>Full disclosure: we are running the client code in the WebSphere Liberty >>Profile app server and I have packaged the Kafka and ZK code into an OSGi >>bundle. >> >>I'd like to see what the client is doing, so I took the default >>log4j.properties file that ships with ZK 3.4.6 and added this to the end: >> >>log4j.logger.org.apache.zookeeper=DEBUG, CONSOLE >> >>I see this in the console log that Liberty manages: >> >>[err] log4j:WARN No appenders could be found for logger >>(org.apache.zookeeper.ZooKeeper). >> >>I'm not that familiar with log4j configuration. Where have I gone wrong >>here? >> >>Thanks! >> >>Chris >> >> >> >> >> >>From: Chris Barlock/Raleigh/IBM@IBMUS >>To: [email protected] >>Date: 07/27/2015 03:13 PM >>Subject: Re: ZooKeeper Class Will Not Connect >> >> >> >>OK, ignore this...embarrassing! I only got State once and tested it over >>and over again... >> >>Chris >> >> >> >> >> >>From: Chris Barlock/Raleigh/IBM@IBMUS >>To: [email protected] >>Date: 07/27/2015 02:25 PM >>Subject: Re: ZooKeeper Class Will Not Connect >> >> >> >>Chris: >> >>Your sample code works for me and I should be able to adapt it, but I >>still think there is a bug. I made the following change to your sample >>to >> >> >> >>test my method: >> >> private Main(String hostPort) throws Exception { >> this.connectLatch = new CountDownLatch(1); >> waitTime = System.currentTimeMillis(); >> this.zk = new ZooKeeper(hostPort, 3000, this); >> >> States state = zk.getState(); >> while (state != States.CONNECTED) { >> System.out.println("State " + state); >> try { >> Thread.sleep(100); >> } catch (InterruptedException e) { >> System.out.println("Interrupted!"); >> } >> } >> } >> >>This outputs: >> >>State CONNECTING >>Received event: WatchedEvent state:SyncConnected type:None path:null >>State CONNECTING >>State CONNECTING >>... >> >>and zk.getState never returns anything but CONNECTING. It seems that >>this >> >> >> >>started with ZK 3.4.3 as 3.4.2 works for me, but 3.4.3, 4, 5 and 6 all >>have this behavior in which the state is always CONNECTING. >> >>Chris >> >> >> >> >>From: Chris Nauroth <[email protected]> >>To: "[email protected]" <[email protected]> >>Date: 07/24/2015 07:28 PM >>Subject: Re: ZooKeeper Class Will Not Connect >> >> >> >>ZooKeeper writes INFO-level logging about connection and session >>establishment. On the client side, these messages would come from the >>ZooKeeper and ClientCnxn classes. On the server side, these messages >>would come from the ZooKeeperServer, NIOServerCnxn and NettyServerCnxn >>classes. >> >>It's possible that you could get more detail by turning up to DEBUG level >>logging by adding these lines to log4j.properties for the client and >>server respectively: >> >>log4j.logger.org.apache.zookeeper=DEBUG >>log4j.logger.org.apache.zookeeper.server=DEBUG >> >> >> >>--Chris Nauroth >> >> >> >> >>On 7/24/15, 3:42 PM, "Chris Barlock" <[email protected]> wrote: >> >>>Chris: >>> >>>I have defined: >>> >>> private static final int MAX_ZK_CONNECT_ATTEMPTS = 400; >>> >>> private static final long ZK_CONNECT_WAIT = 5; // Milliseconds >>> >>>so, two seconds, but I have also tried with ZK_CONNECT_WAIT = 500 (two >>>hundred seconds). getState always returned CONNECTING. I can play with >>>the async notification, but it really doesn't fit my application very >>>well. Is there any additional server or client tracing that can be >>>enabled to get a better sense of what is going on? >>> >>>Chris >>> >>>IBM Tivoli Systems >>>Research Triangle Park, NC >>>(919) 224-2240 >>>Internet: [email protected] >>> >>> >>> >>>From: Chris Nauroth <[email protected]> >>>To: "[email protected]" <[email protected]> >>>Date: 07/24/2015 06:19 PM >>>Subject: Re: ZooKeeper Class Will Not Connect >>> >>> >>> >>>Hello Chris, >>> >>>Thank you for the detailed repro steps describing which versions work >>>and >>>which versions don't work. I tested your code sample against a 3.4.6 >>>build, and it worked consistently for me. My only thought is that >>perhaps >>>the MAX_ZK_CONNECT_ATTEMPTS and ZK_CONNECT_WAIT constants are set such >>>that the polling loop exits before the connection completes. Perhaps a >>>subtle timing difference in the newer versions is just now exposing >>>this. >>> >>>The typical pattern for connection establishment is to rely on the >>>ZooKeeper client's asynchronous event notification instead of a polling >>>loop. See below for a code sample that initiates the connection and >>>then >>>waits for the SyncConnected event. The ZooKeeper programmer's guide and >>>example program docs have a more detailed discussion of this. >>> >>>http://zookeeper.apache.org/doc/r3.4.6/zookeeperProgrammers.html >>> >>> >>>http://zookeeper.apache.org/doc/r3.4.6/javaExample.html >>> >>> >>>Could you please try running this against your 3.4.6 cluster? I'd be >>>curious to see if the connection completes for you. This would also >>>give >>>you a sense for how long connection establishment is taking and whether >>or >>>not that's in line with your definitions of MAX_ZK_CONNECT_ATTEMPTS and >>>ZK_CONNECT_WAIT. >>> >>>I hope this helps. >>> >>> >>> >>>class Main implements Watcher { >>> >>> private final CountDownLatch connectLatch; >>> private final ZooKeeper zk; >>> >>> public static void main(final String[] args) throws Exception { >>> String hostPort = args[0]; >>> Main main = new Main(hostPort); >>> main.awaitConnection(); >>> System.out.println("Exiting."); >>> } >>> >>> private Main(String hostPort) throws Exception { >>> this.connectLatch = new CountDownLatch(1); >>> this.zk = new ZooKeeper(hostPort, 3000, this); >>> } >>> >>> private void awaitConnection() throws InterruptedException { >>> this.connectLatch.await(); >>> System.out.println("Connection has completed."); >>> } >>> >>> @Override >>> public void process(WatchedEvent event) { >>> System.out.println("Received event: " + event); >>> if (event.getType() == Event.EventType.None) { >>> switch (event.getState()) { >>> case SyncConnected: >>> this.connectLatch.countDown(); >>> break; >>> } >>> } >>> } >>>} >>> >>> >>>--Chris Nauroth >>> >>> >>> >>> >>>On 7/24/15, 9:16 AM, "Chris Barlock" <[email protected]> wrote: >>> >>>>Ran some more tests. My code works fine up through ZK 3.4.2, but then >>>>fails with 3.4.3. I did have to add the following to the >>>>Import-Package >>>>list in the ZK MANIFEST.MF: >>>> >>>>org.slf4 j,javax.security.auth.login,javax.security.sasl >>>> >>>>I could really use some help here, ZK folks! Is my code incorrect with >>>>newer versions of ZK, or is ZK broken? >>>> >>>>Chris >>>> >>>> >>>> >>>> >>>> >>>>From: Chris Barlock/Raleigh/IBM@IBMUS >>>>To: [email protected] >>>>Date: 07/23/2015 09:34 PM >>>>Subject: Re: ZooKeeper Class Will Not Connect >>>> >>>> >>>> >>>>I tried the 3.5.0 alpha build to see if it made any difference. It did >>>>not. >>>> >>>>But, I had to hack the MANIFEST.MF file in the JAR because the >>>>"3.50-alpha" version fails tests in OSGi bundles that import the ZK >>>>classes which have something like: >>>> >>>>version="[3.2,4)" >>>> >>>>I suggest that if you want to name the JAR "3.50-alpha" that all the >>>>internals just use a version of 3.50. >>>> >>>>Chris >>>> >>>>IBM Tivoli Systems >>>>Research Triangle Park, NC >>>>(919) 224-2240 >>>>Internet: [email protected] >>>> >>>> >>>> >>>>From: Chris Barlock/Raleigh/IBM@IBMUS >>>>To: [email protected] >>>>Date: 07/23/2015 01:37 PM >>>>Subject: ZooKeeper Class Will Not Connect >>>> >>>> >>>> >>>>We are attempting to upgrade from Kafka 0.8.0, which includes ZK 3.3.4 >>to >>>>Kafka 0.8.2.1 with ZK 3.4.6. My code which attempts to connect to ZK >>>>is >>>>pretty straightforward: >>>> >>>> try { >>>> ZooKeeper zk = new ZooKeeper(connectString, >>>sessionTimeout >>>>, this); >>>> int connectAttempts = 0; >>>> >>>> while (!zk.getState().isConnected() && connectAttempts >>>>< >>>>MAX_ZK_CONNECT_ATTEMPTS) { >>>> try { >>>> Thread.sleep(ZK_CONNECT_WAIT); >>>> } catch (InterruptedException e) { >>>> // Ignore >>>> } >>>> connectAttempts++; >>>> } >>>> } catch (IOException e) { >>>> trace.exception(CLASS_NAME, methodName, e); >>>> } >>>> >>>>With some additional tracing, States is always CONNECTING. Has >>something >>>>changed with 3.4.6 about how I should connect to the server? I can >>>>connect just fine with the zookeeper-shell.sh that Kafka ships. This >>>>code >>>> >>>> >>>>always runs on the same system as ZK, so the connectString is always >>>>"localhost:2181" >>>> >>>>Chris >>>> >>> >>> >> >> >> >> >> > >
