We had to fix a Heisenbug.  (a bug that never breaks when you look at it)
 This was a bug that happened only in production, didn't break every time,
and never broke while debugging.  It also got less frequent the more we
added logging to report on the current state when the bug happened.  We
knew it was a timing issue but exactly what was happening eluded us.  The
trick eventual was logging.  We kept adding and removing logging and
mapping the state during the bug until we understood it on paper.  Then we
created a test that reproduced it, fixed it, and celebrated!  It was really
difficult!  The lesson is that no matter how difficult or infrequent the
bug occurs with enough persistence and hard work you can figure it out!

All the best,

Ron Teitelbaum
www.3dicc.com

On Thu, Mar 9, 2017 at 9:38 AM Max Leske <[email protected]> wrote:

> Fixing a race condition in handling open sockets when forking an image. At
> first I had no clue where the problem could come from, then I spent a lot
> of time guessing at the conditions (of course, being a race condition there
> was no means to force the specific condition but I didn't know that yet).
>
> Over all I spent about 6 weeks on this bug and finally fixed it by
> creating a new primitive to handle that specific case. I'm not sure what
> tools could have helped me as this was a rather specific problem
> (OSProcess). But the hardest problems in my experience are usually
> concurrency / asynchrony (e.g. race conditions) or bugs in libraries (where
> you always assume that you must have made a mistake, never the library).
>
> Max
>
> > On 9 Mar 2017, at 12:36, Stephane Ducasse <[email protected]>
> wrote:
> >
> > Hi guys
> >
> > During the DSU workshop we were brainstorming about what are the most
> difficult bugs we faced and what are the conceptual tools that would have
> helped you.
> >
> > Stef
>
>
>
>

Reply via email to