Re: [Yoshimi-devel] Occasional fault on startup

Ichthyostega Sun, 01 Aug 2021 03:57:20 -0700

Am 01.08.21 um 11:30 schrieb Will Godfrey:

On rare and erratic times I'm seeing an stoi fault when starting, resultingin an abort. A second attempt *always* succeeds.

These days we have a mixture of stoi and the older string2int function thatCal set up. The latter returns zero for invalid strings rather thanfaulting.

The 'correct' thing to do is of course to track down the error, but seeing as
I restart Yoshimi many times a day, and this might take weeks to show up I
don't have any enthusiasm for that!

Would it be acceptable to just do a bulk swap of stoi to string2int?




Hi Will,

so the question is, do we want to identify and fix this bug,
or do we want to sweep it under the carpet? Both /can/ be reasonable.

To get a better base for that decision, let me ask first: what do we know about
that behaviour? Do we have any clue what kind of hidden problem this might be?
Is our control flow logic broken and takes the wrong path sometimes? Do we have
an out-of-bounds write? Or do we have a concurrency problem?

While Yoshimi rather isn't software controlling nuclear power plants, a random
data corruption would still be a nasty beast. Or to put it differently: if we
interpret a garbled string as zero, is this even a valid default at that place,
or will that yield totally weird behaviour downstream?

This leads to the next question: can we somehow narrow down the situation where
it happens? Can we narrow it down to a certain part of the system or a specific
code path?

What you typically do is to put in instrumentation, i.e. improving the error
message in ways that do not cost performance or at least at places where it's
not performance critical. What you also typically do in such situations is to
have a toggle for that instrumentation, so you can activate it only when the
generated messages will also be analysed by someone. That means, we (developers)
would use yoshimi with that instrumentation activated, and just wait for the
bug to go into the trap. For some extended period of time, we'd just collect
incidents and be done with it. If after some reasonable time we still didn't
get any lead, we can still decide to sweep it under the carpet.


What I'd propose thus is not to bulk replace stoi to string2int.
Rather, I'd propose we create a second function alongside string2int, which
does require a valid int in the string, but the error handling has a guard
that can be toggled off for regular (release) builds. We could use that new
function at places where we hope to catch that bug and which are not performance
critical. One simple approach would be a to have a build time switch (similar to
the noteon stuff) and let the instrumented version print out __FILE__ and
__LINE__ and __func__ and the garbled string value.
This would at least tell us where it happens.

I just searched for stoi and got 14 hits, so that proposal would be rather
easy to carry out. Taking a closer look, these 14 hits are all not performance
critical, so we could even put in diagnostics there unconditionally. Especially
those three usages in main.cpp look promising :-D
OTOTH, string2int gives 204 hits. So another approach would be to put the
instrumentation into string2int, hoping that this will increase the likelihood
of catching the beast.


So my bottom line is: we should not put a mad effort into this, but we
should at least gather some more knowledge and then take a better informed
decision if we want to follow up or rather sweep it under the carpet.

-- Hermann



_______________________________________________
Yoshimi-devel mailing list
Yoshimi-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/yoshimi-devel

Re: [Yoshimi-devel] Occasional fault on startup

Reply via email to