Doug, I've filed https://review.openstack.org/213542 to log error messages. Will work with oslo.messaging folks the next few days.
Thanks, Dims On Fri, Aug 14, 2015 at 6:58 PM, Doug Hellmann <d...@doughellmann.com> wrote: > All patches to oslo.messaging are currently failing the > gate-tempest-dsvm-neutron-src-oslo.messaging job because the neutron > service dies. amuller, kevinbenton, and I spent a bunch of time looking at > it today, and I think we have an issue introduced by some asymmetric gating > between the two projects. > > Neutron has 2 different modes for starting the RPC service, depending on > the number of workers requested. The problem comes up with rpc_workers=0, > which is the new default. In that mode, rather than using the > ProcessLauncher, the RPC server is started directly in the current process. > That results in wait() being called in a way that violates the new > constraints being enforced within oslo.messaging after [1] landed. That > patch is unreleased, so the only project seeing the problem is > oslo.messaging. I’ve proposed a revert in [2], which passes the gate tests. > > I have also added [3] to neutron to see if we can get the gate job to show > the same error messages I was seeing locally (part of the trouble we’ve had > with debugging this is the process exits quickly enough that some of the > log messages are never being written). I’m using [4] as a patch in > oslo.messaging that was failing before to trigger the job to get the > necessary log. That patch should *not* be landed, since I don’t think the > change it reverts is related to the problem, it was just handy for > debugging. > > The error message I see locally, “start/stop/wait must be called in the > same thread”, is visible in this log snippet [5]. > > It’s not clear what the best path forward is. Obviously neutron is doing > something with the RPC server that oslo.messaging doesn’t expect/want/like, > but also obviously we can’t release oslo.messaging in its current state and > break neutron. Someone with a better understanding of both neutron and > oslo.messaging may be able to fix neutron’s use of the RPC code to avoid > this case. There may be other users of oslo.messaging with the same > ‘broken’ pattern, but IIRC neutron is unique in the way it runs both RPC > and API services in the same process. To be safe, though, it may be better > to log error messages instead of doing whatever we’re doing now to cause > the process to exit. We can then set up a log stash search for the error > message and find other applications that would be broken, fix them, and > then switch oslo.messaging back to throwing an exception. > > I’m going to be at the Ops summit next week, so I need to hand off > debugging and fixing the issue to someone else on the Oslo team. We created > an etherpad to track progress and make notes today, and all of these links > are referenced there, too [6]. > > Thanks again to amuller and kevinbenton for the time they spent helping > with debugging today! > > Doug > > [1] https://review.openstack.org/#/c/209043/ > [2] https://review.openstack.org/#/c/213299/ > [3] https://review.openstack.org/#/c/213360/ > [4] https://review.openstack.org/#/c/213297/ > [6] http://paste.openstack.org/show/415030/ > [6] https://etherpad.openstack.org/p/wm2D6UGZbf > > -- Davanum Srinivas :: https://twitter.com/dims
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev