To be clear, I didn't expect my PR to resolve the issue. My point simply
was that it's a large change, and I don't want to have to resolve the
conflicts that may arise from fixing the deadlock before the PR is merged.

In my opinion, all the synchronization around the session state is not
ideal. I'd like to find a way to refactor it completely to employ something
more elegant.


Justin

On Thu, Jan 20, 2022 at 1:32 PM Takeshi Fukushima <takesh...@gmail.com>
wrote:

> Hi Justin
>
> I realize that my fix is not the best, I just don't know the codebase well
> enough to do a better job. That said, the methods that I've marked as
> synchronized (MQTTConnectionManager connect and disconnect) already
> synchronize themselves on the session state, but the disconnect method
> synchronizes on the sessionState, which might get swapped before it has a
> chance to call (synchronized) MQTTSession.stop (which in turn, calls the
> synchronized MQTTSession.clear but on a different instance of the session).
> On top of that, since a freshly created MQTTSession uses a global
> MQTTSessionState to begin with (and some methods synchronizes over the
> state), I think this causes more (unintended) contention and may be another
> source of deadlocks.
>
> I've looked at your PR and, unfortunately, the issue still remains as far
> as I can tell. I will be happy to help if you need anything and I will do
> my best to create a testcase that can at least sometimes reproduce the
> problem (which unfortunately depends all too much on the thread scheduling
> order).
>
> Thanks for the response,
> takeshi
>
> On Thu, Jan 20, 2022 at 4:09 PM Justin Bertram <jbert...@apache.org>
> wrote:
>
> > Making a method synchronized typically isn't the preferred solution to
> > concurrency issues due to the performance penalty it typically imposes.
> > Generally targeted locks are better. However, concurrency issues are
> often
> > quite difficult to diagnose and resolve so sometimes a less-than-ideal
> > solution is better than broken code.
> >
> > I currently have a large PR [1] for MQTT 5 support that is yet to be
> > merged. Once that is merged I'll take a closer look at the deadlock issue
> > from ARTEMIS-3622 that you reported.
> >
> >
> > Justin
> >
> > [1] https://github.com/apache/activemq-artemis/pull/3907
> >
> > On Thu, Jan 20, 2022 at 11:08 AM Takeshi Fukushima <takesh...@gmail.com>
> > wrote:
> >
> > > Hello everyone
> > >
> > > Last month I've opened up a ticket regarding a deadlock that was
> > occurring
> > > on one of our servers.
> > > https://issues.apache.org/jira/browse/ARTEMIS-3622
> > >
> > > After much trial and error, I think I've narrowed it down the cause and
> > > it's in the comments, but being a racing condition I cannot reliably
> > > reproduce in a test case. I can reproduce manually (carefully advancing
> > the
> > > threads in the debugger).
> > >
> > > I also did patch our server with the quickfix indicated in the comments
> > and
> > > I will let you know if that fixes the problem (so far so good, but its
> > been
> > > only about 18 hours). If everything goes well, I will create a PR but I
> > > would like to know if a simple solution like that (marking a couple of
> > > methods as synchronized) would be acceptable, even though I think it
> > > doesn't fixes all the cases (that one would require much in depth
> > changes I
> > > think).
> > >
> > > Thanks for the great software anyway,
> > > takeshi
> > >
> > > --
> > > http://mapsdev.blogspot.com/
> > > Marcelo Takeshi Fukushima
> > >
> >
>
>
> --
> http://mapsdev.blogspot.com/
> Marcelo Takeshi Fukushima
>

Reply via email to