[ https://issues.apache.org/jira/browse/ZOOKEEPER-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906012#action_12906012 ]
Stephen McCants commented on ZOOKEEPER-863: ------------------------------------------- Okay, one of my coworkers was able to figure out some more about this. First off, the zookeeper.log file shows it spinning trying to request the discovery root over and over again. (I'll attach the log file). We are using Eclipse CM (Configuration Manager) to start the service that registers with ZooKeeper. First we start the Eclipse based application (call it app1) that starts ZooKeeper internally (as described above). ZooKeeper immediately goes into its infinite loop. Then I can start a different Eclipse based application (call it app2) which uses has a service (LoadLevelerJobService) that registers with ZooKeeper. If the service was started previously, then it will register with ZK, but ZK will stay in the infinite loop. Here is the output for that: ZooDiscovery> Service Unpublished: Sep 3, 2010 12:23:48 PM. ServiceInfo[uri=osgiservices://9.53.189.11:30001/svc_47oWIw/Wrs4W/KQiV3f78Ggucu0=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://9.53.189.11:30001/svc_47oWIw/Wrs4W/KQiV3f78Ggucu0=;full=_osgiservices._tcp.default._i...@osgiservices://9.53.189.11:30001/svc_47oWIw/Wrs4W/KQiV3f78Ggucu0=];priority=0;weight=0;props=ServiceProperties[{com.ibm.hdwb.jobs.common.monitor.port=9020, ecf.sp.ect=ecf.generic.server, com.ibm.hdwb.jobs.common.monitor.command=/afs/awd/projects/cte/tools/hdwb/prod/llmonitor/monitor.ksh, component.id=18, com.ibm.hdwb.jobs.common.monitor.host=smccants.austin.ibm.com, component.name=com.ibm.hdwb.ll.server.job_queue_service, ecf.rsvc.id=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@fec0fec, ll_submit_command=, ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@6160616, com.ibm.hdwb.jobs.common.pool.uuid=dd38f018-b15e-4981-b71a-0453ea307634, com.ibm.hdwb.jobs.common.monitor.submitter=smccants, service.factoryPid=com.ibm.hdwb.ll.server.job_queue_service, service.pid=com.ibm.hdwb.ll.server.job_queue_service-1283452175751-0, com.ibm.hdwb.jobs.common.monitor.restlet.port=8080, osgi.remote.service.interfaces=com.ibm.hdwb.jobs.common.IJobQueueService, ecf.rsvc.ns=ecf.namespace.generic.remoteservice, ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, com.ibm.hdwb.jobs.common.monitor.restlet.host=smccants.austin.ibm.com}]] Activating LoadLevelerJobService Pool : null Monitor Command : /afs/awd/projects/cte/tools/hdwb/prod/llmonitor/monitor.ksh Monitor Host : smccants.austin.ibm.com Monitor Port : 9020 Submit Command : Activating LoadLevelerJobLocatorService 12:23:48.838 [1120420...@qtp-1972401552-0 - /system/console/configMgr/com.ibm.hdwb.ll.server.job_queue_service-1283452175751-0] DEBUG org.mortbay.log - RESPONSE /system/console/configMgr/com.ibm.hdwb.ll.server.job_queue_service-1283452175751-0 200 ZooDiscovery> Service Published: Sep 3, 2010 12:23:48 PM. ServiceInfo[uri=osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=;full=_osgiservices._tcp.default._i...@osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=];priority=0;weight=0;props=ServiceProperties[{com.ibm.hdwb.jobs.common.monitor.port=9020, ecf.sp.ect=ecf.generic.server, com.ibm.hdwb.jobs.common.monitor.command=/afs/awd/projects/cte/tools/hdwb/prod/llmonitor/monitor.ksh, component.id=19, com.ibm.hdwb.jobs.common.monitor.host=smccants.austin.ibm.com, component.name=com.ibm.hdwb.ll.server.job_queue_service, ecf.rsvc.id=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@314c314c, ll_submit_command=, ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@188b188b, com.ibm.hdwb.jobs.common.pool.uuid=dd38f018-b15e-4981-b71a-0453ea307634, com.ibm.hdwb.jobs.common.monitor.submitter=smccants, service.factoryPid=com.ibm.hdwb.ll.server.job_queue_service, service.pid=com.ibm.hdwb.ll.server.job_queue_service-1283452175751-0, com.ibm.hdwb.jobs.common.monitor.restlet.port=8080, osgi.remote.service.interfaces=com.ibm.hdwb.jobs.common.IJobQueueService, ecf.rsvc.ns=ecf.namespace.generic.remoteservice, ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, com.ibm.hdwb.jobs.common.monitor.restlet.host=smccants.austin.ibm.com}]] Not sure why it shows an unplublish first... that maybe a clue. If I delete the Configuration for the service in app2, it will unregister: ZooDiscovery> Service Unpublished: Sep 3, 2010 12:24:00 PM. ServiceInfo[uri=osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=;full=_osgiservices._tcp.default._i...@osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=];priority=0;weight=0;props=ServiceProperties[{com.ibm.hdwb.jobs.common.monitor.port=9020, ecf.sp.ect=ecf.generic.server, com.ibm.hdwb.jobs.common.monitor.command=/afs/awd/projects/cte/tools/hdwb/prod/llmonitor/monitor.ksh, component.id=19, com.ibm.hdwb.jobs.common.monitor.host=smccants.austin.ibm.com, component.name=com.ibm.hdwb.ll.server.job_queue_service, ecf.rsvc.id=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@314c314c, ll_submit_command=, ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@188b188b, com.ibm.hdwb.jobs.common.pool.uuid=dd38f018-b15e-4981-b71a-0453ea307634, com.ibm.hdwb.jobs.common.monitor.submitter=smccants, service.factoryPid=com.ibm.hdwb.ll.server.job_queue_service, service.pid=com.ibm.hdwb.ll.server.job_queue_service-1283452175751-0, com.ibm.hdwb.jobs.common.monitor.restlet.port=8080, osgi.remote.service.interfaces=com.ibm.hdwb.jobs.common.IJobQueueService, ecf.rsvc.ns=ecf.namespace.generic.remoteservice, ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, com.ibm.hdwb.jobs.common.monitor.restlet.host=smccants.austin.ibm.com}]] Then if I recreate the configuration (which recreates the service): Activating LoadLevelerJobService Pool : null Monitor Command : /afs/awd/u/smccants/pub/monitor2/monitor.ksh Monitor Host : smccants.austin.ibm.com Monitor Port : 9020 Submit Command : Activating LoadLevelerJobLocatorService ZooDiscovery> Service Published: Sep 3, 2010 12:24:35 PM. ServiceInfo[uri=osgiservices://9.53.189.11:30001/svc_WIppgVmdSZTGK0xWz1mQ/OaSqYQ=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://9.53.189.11:30001/svc_WIppgVmdSZTGK0xWz1mQ/OaSqYQ=;full=_osgiservices._tcp.default._i...@osgiservices://9.53.189.11:30001/svc_WIppgVmdSZTGK0xWz1mQ/OaSqYQ=];priority=0;weight=0;props=ServiceProperties[{com.ibm.hdwb.jobs.common.monitor.port=9020, ecf.sp.ect=ecf.generic.server, com.ibm.hdwb.jobs.common.monitor.command=/afs/awd/u/smccants/pub/monitor2/monitor.ksh, component.id=20, com.ibm.hdwb.jobs.common.monitor.host=smccants.austin.ibm.com, component.name=com.ibm.hdwb.ll.server.job_queue_service, ecf.rsvc.id=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@7a847a84, ll_submit_command=, ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@7d277d27, com.ibm.hdwb.jobs.common.pool.uuid=dd38f018-b15e-4981-b71a-0453ea307634, com.ibm.hdwb.jobs.common.monitor.submitter=smccants, service.factoryPid=com.ibm.hdwb.ll.server.job_queue_service, service.pid=com.ibm.hdwb.ll.server.job_queue_service-1283534671864-0, com.ibm.hdwb.jobs.common.monitor.restlet.port=8080, osgi.remote.service.interfaces=com.ibm.hdwb.jobs.common.IJobQueueService, ecf.rsvc.ns=ecf.namespace.generic.remoteservice, ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, com.ibm.hdwb.jobs.common.monitor.restlet.host=smccants.austin.ibm.com}]] At this point, ZooKeeper gets knocked out of its infinite loop and stops consuming all the CPU. This looks to me like a pretty serious ZK bug. > Runaway thread - Zookeeper inside Eclipse > ----------------------------------------- > > Key: ZOOKEEPER-863 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-863 > Project: Zookeeper > Issue Type: Bug > Affects Versions: 3.3.0 > Environment: Linux; x86 > Reporter: Stephen McCants > Priority: Critical > > I'm running Zookeeper inside an Eclipse application. When I launch the > application from inside Eclipse I use the following arguments: > -Dzoodiscovery.autoStart=true > -Dzoodiscovery.flavor=zoodiscovery.flavor.centralized=localhost > This causes the application to start its own ZooKeeper server inside the > JVM/application. It immediately goes into a runaway state. The name of the > runaway thread is "NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181". When I > suspend this thread, the CPU usage returns to 0. Here is a stack trace from > that thread when it is suspended: > EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native > method] > EPollArrayWrapper.poll(long) line: 215 > EPollSelectorImpl.doSelect(long) line: 77 > EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69 > EPollSelectorImpl(SelectorImpl).select(long) line: 80 > NIOServerCnxn$Factory.run() line: 232 > Any ideas what might be going wrong? > Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.