[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906012#action_12906012
 ] 

Stephen McCants commented on ZOOKEEPER-863:
-------------------------------------------

Okay, one of my coworkers was able to figure out some more about this. First 
off, the zookeeper.log file shows it spinning trying to request the discovery 
root over and over again. (I'll attach the log file).

We are using Eclipse CM (Configuration Manager) to start the service that 
registers with ZooKeeper. First we start the Eclipse based application (call it 
app1) that starts ZooKeeper internally (as described above). ZooKeeper 
immediately goes into its infinite loop.

Then I can start a different Eclipse based application (call it app2) which 
uses has a service (LoadLevelerJobService) that registers with ZooKeeper. If 
the service was started previously, then it will register with ZK, but ZK will 
stay in the infinite loop.

Here is the output for that:

ZooDiscovery> Service Unpublished: Sep 3, 2010 12:23:48 PM. 
ServiceInfo[uri=osgiservices://9.53.189.11:30001/svc_47oWIw/Wrs4W/KQiV3f78Ggucu0=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://9.53.189.11:30001/svc_47oWIw/Wrs4W/KQiV3f78Ggucu0=;full=_osgiservices._tcp.default._i...@osgiservices://9.53.189.11:30001/svc_47oWIw/Wrs4W/KQiV3f78Ggucu0=];priority=0;weight=0;props=ServiceProperties[{com.ibm.hdwb.jobs.common.monitor.port=9020,
 ecf.sp.ect=ecf.generic.server, 
com.ibm.hdwb.jobs.common.monitor.command=/afs/awd/projects/cte/tools/hdwb/prod/llmonitor/monitor.ksh,
 component.id=18, 
com.ibm.hdwb.jobs.common.monitor.host=smccants.austin.ibm.com, 
component.name=com.ibm.hdwb.ll.server.job_queue_service, 
ecf.rsvc.id=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@fec0fec,
 ll_submit_command=, 
ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@6160616,
 com.ibm.hdwb.jobs.common.pool.uuid=dd38f018-b15e-4981-b71a-0453ea307634, 
com.ibm.hdwb.jobs.common.monitor.submitter=smccants, 
service.factoryPid=com.ibm.hdwb.ll.server.job_queue_service, 
service.pid=com.ibm.hdwb.ll.server.job_queue_service-1283452175751-0, 
com.ibm.hdwb.jobs.common.monitor.restlet.port=8080, 
osgi.remote.service.interfaces=com.ibm.hdwb.jobs.common.IJobQueueService, 
ecf.rsvc.ns=ecf.namespace.generic.remoteservice, 
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, 
com.ibm.hdwb.jobs.common.monitor.restlet.host=smccants.austin.ibm.com}]]
Activating LoadLevelerJobService
  Pool : null
  Monitor Command : /afs/awd/projects/cte/tools/hdwb/prod/llmonitor/monitor.ksh
  Monitor Host : smccants.austin.ibm.com
  Monitor Port : 9020
  Submit Command :
Activating LoadLevelerJobLocatorService
12:23:48.838 [1120420...@qtp-1972401552-0 - 
/system/console/configMgr/com.ibm.hdwb.ll.server.job_queue_service-1283452175751-0]
 DEBUG org.mortbay.log - RESPONSE 
/system/console/configMgr/com.ibm.hdwb.ll.server.job_queue_service-1283452175751-0
 200
ZooDiscovery> Service Published: Sep 3, 2010 12:23:48 PM. 
ServiceInfo[uri=osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=;full=_osgiservices._tcp.default._i...@osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=];priority=0;weight=0;props=ServiceProperties[{com.ibm.hdwb.jobs.common.monitor.port=9020,
 ecf.sp.ect=ecf.generic.server, 
com.ibm.hdwb.jobs.common.monitor.command=/afs/awd/projects/cte/tools/hdwb/prod/llmonitor/monitor.ksh,
 component.id=19, 
com.ibm.hdwb.jobs.common.monitor.host=smccants.austin.ibm.com, 
component.name=com.ibm.hdwb.ll.server.job_queue_service, 
ecf.rsvc.id=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@314c314c,
 ll_submit_command=, 
ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@188b188b,
 com.ibm.hdwb.jobs.common.pool.uuid=dd38f018-b15e-4981-b71a-0453ea307634, 
com.ibm.hdwb.jobs.common.monitor.submitter=smccants, 
service.factoryPid=com.ibm.hdwb.ll.server.job_queue_service, 
service.pid=com.ibm.hdwb.ll.server.job_queue_service-1283452175751-0, 
com.ibm.hdwb.jobs.common.monitor.restlet.port=8080, 
osgi.remote.service.interfaces=com.ibm.hdwb.jobs.common.IJobQueueService, 
ecf.rsvc.ns=ecf.namespace.generic.remoteservice, 
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, 
com.ibm.hdwb.jobs.common.monitor.restlet.host=smccants.austin.ibm.com}]]

Not sure why it shows an unplublish first... that maybe a clue.

If I delete the Configuration for the service in app2, it will unregister:

ZooDiscovery> Service Unpublished: Sep 3, 2010 12:24:00 PM. 
ServiceInfo[uri=osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=;full=_osgiservices._tcp.default._i...@osgiservices://9.53.189.11:30001/svc_NgkGKIokGDJC86wytsCHPug3a1E=];priority=0;weight=0;props=ServiceProperties[{com.ibm.hdwb.jobs.common.monitor.port=9020,
 ecf.sp.ect=ecf.generic.server, 
com.ibm.hdwb.jobs.common.monitor.command=/afs/awd/projects/cte/tools/hdwb/prod/llmonitor/monitor.ksh,
 component.id=19, 
com.ibm.hdwb.jobs.common.monitor.host=smccants.austin.ibm.com, 
component.name=com.ibm.hdwb.ll.server.job_queue_service, 
ecf.rsvc.id=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@314c314c,
 ll_submit_command=, 
ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@188b188b,
 com.ibm.hdwb.jobs.common.pool.uuid=dd38f018-b15e-4981-b71a-0453ea307634, 
com.ibm.hdwb.jobs.common.monitor.submitter=smccants, 
service.factoryPid=com.ibm.hdwb.ll.server.job_queue_service, 
service.pid=com.ibm.hdwb.ll.server.job_queue_service-1283452175751-0, 
com.ibm.hdwb.jobs.common.monitor.restlet.port=8080, 
osgi.remote.service.interfaces=com.ibm.hdwb.jobs.common.IJobQueueService, 
ecf.rsvc.ns=ecf.namespace.generic.remoteservice, 
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, 
com.ibm.hdwb.jobs.common.monitor.restlet.host=smccants.austin.ibm.com}]]

Then if I recreate the configuration (which recreates the service):

Activating LoadLevelerJobService
  Pool : null
  Monitor Command : /afs/awd/u/smccants/pub/monitor2/monitor.ksh
  Monitor Host : smccants.austin.ibm.com
  Monitor Port : 9020
  Submit Command :
Activating LoadLevelerJobLocatorService
ZooDiscovery> Service Published: Sep 3, 2010 12:24:35 PM. 
ServiceInfo[uri=osgiservices://9.53.189.11:30001/svc_WIppgVmdSZTGK0xWz1mQ/OaSqYQ=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://9.53.189.11:30001/svc_WIppgVmdSZTGK0xWz1mQ/OaSqYQ=;full=_osgiservices._tcp.default._i...@osgiservices://9.53.189.11:30001/svc_WIppgVmdSZTGK0xWz1mQ/OaSqYQ=];priority=0;weight=0;props=ServiceProperties[{com.ibm.hdwb.jobs.common.monitor.port=9020,
 ecf.sp.ect=ecf.generic.server, 
com.ibm.hdwb.jobs.common.monitor.command=/afs/awd/u/smccants/pub/monitor2/monitor.ksh,
 component.id=20, 
com.ibm.hdwb.jobs.common.monitor.host=smccants.austin.ibm.com, 
component.name=com.ibm.hdwb.ll.server.job_queue_service, 
ecf.rsvc.id=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@7a847a84,
 ll_submit_command=, 
ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@7d277d27,
 com.ibm.hdwb.jobs.common.pool.uuid=dd38f018-b15e-4981-b71a-0453ea307634, 
com.ibm.hdwb.jobs.common.monitor.submitter=smccants, 
service.factoryPid=com.ibm.hdwb.ll.server.job_queue_service, 
service.pid=com.ibm.hdwb.ll.server.job_queue_service-1283534671864-0, 
com.ibm.hdwb.jobs.common.monitor.restlet.port=8080, 
osgi.remote.service.interfaces=com.ibm.hdwb.jobs.common.IJobQueueService, 
ecf.rsvc.ns=ecf.namespace.generic.remoteservice, 
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, 
com.ibm.hdwb.jobs.common.monitor.restlet.host=smccants.austin.ibm.com}]]

At this point, ZooKeeper gets knocked out of its infinite loop and stops 
consuming all the CPU.

This looks to me like a pretty serious ZK bug.

> Runaway thread - Zookeeper inside Eclipse
> -----------------------------------------
>
>                 Key: ZOOKEEPER-863
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-863
>             Project: Zookeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.0
>         Environment: Linux; x86
>            Reporter: Stephen McCants
>            Priority: Critical
>
> I'm running Zookeeper inside an Eclipse application.  When I launch the 
> application from inside Eclipse I use the following arguments:
> -Dzoodiscovery.autoStart=true
> -Dzoodiscovery.flavor=zoodiscovery.flavor.centralized=localhost
> This causes the application to start its own ZooKeeper server inside the 
> JVM/application.  It immediately goes into a runaway state.  The name of the 
> runaway thread is "NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181".  When I 
> suspend this thread, the CPU usage returns to 0.  Here is a stack trace from 
> that thread when it is suspended:
> EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native 
> method] 
> EPollArrayWrapper.poll(long) line: 215        
> EPollSelectorImpl.doSelect(long) line: 77     
> EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69        
> EPollSelectorImpl(SelectorImpl).select(long) line: 80 
> NIOServerCnxn$Factory.run() line: 232 
> Any ideas what might be going wrong?
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to