On Sun, Jul 9, 2023 at 5:54 PM Chet Ramey <chet.ra...@case.edu> wrote:

> On 7/7/23 6:45 PM, enh wrote:
>
> >      > define "mess up"...
> >
> >     Maybe "mess up" was too strong. I think it's rude to send a fatal
> signal if
> >     you have the function. Either don't provide it or make it return
> -1/ENOSYS.
> >     Android is by no means the only place this is a problem; it happens
> with
> >     docker all the time:
> >
> >     https://lists.gnu.org/archive/html/bug-bash/2022-03/msg00010.html
> >     <https://lists.gnu.org/archive/html/bug-bash/2022-03/msg00010.html>
> >
> >
> > yeah, if it's any consolation we argued about this choice a lot. (which
> is
> > probably also why the choice exists in the first place!)
> >
> > iirc we even _tried_ -1/ENOSYS to test the predictions that (a) most c
> > programmers don't check for failures (b) even those that do don't log
> > enough to be able to debug these problems
>
> It penalizes those who do.
>

yeah, but we optimize for the common case.


> > and (c) even in cases where
> > errors are checked and logged, the ability of folks -- including _end
> > users_ -- to triage such problems is limited.
>
> Killing the process certainly doesn't improve that. How is it better for
> triage when the process dies unexpectedly with a fatal signal -- when
> running a builtin command -- that can't be reproduced anywhere else? If
> Grisha Levit hadn't recognized what was going on, I never would have even
> looked at it. I don't use Android as a development platform.
>

and i think that's the cross-purpose we're talking at here ... the 99% case
is app code, which actually means "JNI library dlopen()ed into a clone of
the zygote --- no main() at all". most app developers have no direct
contact with their users, and rely on clustered groups of
automatically-collected crashes. so a _crash_ is 100x more likely to
actually be something the developer sees. (as opposed to "i seem to be
having more users uninstall my app, but have no idea why and no way to find
out".)

the kind of collaborative command-line debugging over a mailing list that
you're familiar with doesn't exist for the TikToks of this world.


> > i'm not a _fan_ of SIGSYS,
> > but i do still think it was the lesser of the available evils. (but of
> > course it's the folks who _do_ check for failures who're most likely to
> > disagree;
>
> OK, I'm going to disagree.
>

because you have a completely different use case _and_ you're a very
different kind of developer. there's no question it's not a good fit for
you. (but you're running in an app _context_ even if not in an app, so you
get to enjoy "one size fits all" :-( )


> >      > Android deliberately has strict seccomp filters for
> >      > apps, and the syscalls mentioned in that post are on the "no"
> list.
> >     Android
> >      > gives each _app_ a different uid, so there's typically nothing
> >     useful you
> >      > can do here anyway. (things are a bit different if you're actually
> >     part of
> >      > the OS, but bash being GPL makes that unlikely :-( )
> >
> >     Sure. But if you have the function, bash assumes it works as
> documented in
> >     the rest of the Linux world.
> >
> >
> > and so it does, if you're not an app.
> >
> > the tricky case here has always been the conflict between the folks who
> > want to check at build time versus those who want to check at runtime
> ...
> > if we hide things in the headers, people complain "i wasn't going to
> call
> > it unless i know i can anyway". (plus in addition to the general rule
> that
> > "most configure scripts don't handle cross-compilation correctly", you'd
> be
> > surprised how many screw up the simple "is this function available?"
> test
> > --- they don't compile a small program that #includes the right header
> and
> > tries to call the function; they grep the header or nm the library or
> > whatever.)
>
> If AC_CHECK_FUNCS doesn't do the right thing when cross-compiling, maybe
> the autotools folks should hear about it?
>

i don't think autoconf is broken (given that _some_ stuff gets it right).
but a lot of library developers just don't take cross-compilation into
account. (obviously massive selection bias here since i was an embedded
developer before working on Android, so i've mostly _only_ known
cross-compilation, plus i obviously don't have to spend much time on the
libraries that _do_ work.)


> >      > (yes, i agree that it's mildly unfortunate that there's no special
> >     case for
> >      > "i don't actually want to change anything", which i think is the
> case
> >      > they're talking about in that post, and i've wondered about adding
> >     that to
> >      > libc once or twice, but my feeling is that it wouldn't be
> particularly
> >      > useful in practice because _that_ kind of code probably needs a
> rethink
> >      > anyway when porting to Android.)
>
> I'm not sure moving that burden to the application is appropriate, either.
> But bash has no android-specific code; there's no "porting to android" at
> this point.
>

again, if you're an _application_ things are very different --- it's a
reasonable assumption that iOS and Android are your two main targets, and
i'm sure there's plenty of stuff you can't do on iOS for security reasons
too. you just don't see that because you wouldn't be able to run bash on
iOS in the first place :-)


> >
> >     I just added code to bash that doesn't try to call
> setresuid/setresgid if
> >     nothing is actually changing, which will save a couple calls
> everywhere.
> >     But killing the process is rude.
> >
> >
> > you're talking to the guy who added the equivalent of `if (fp == NULL)
> > abort_with_message("you can't pass a null FILE*");` to every stdio
> function
> > because too many developers don't check fopen() succeeded, and think
> libc
> > is broken if fgets() dereferences a null pointer.
>
> Oh, come on, really? What's next? Adding checks in the str* functions
> for NULL pointers? I am sure that catering to bad programming is not the
> way to improve programming practices.
>

no, because there's no causal link with an earlier failure there. again,
imagine you're in an app developer's shoes for a moment --- are you going
to be able to remotely debug (without being able to talk to the affected
users) "crashes all the time; zero stars" or "Cause: null FILE* passed to
fgets()" with a stack trace that shows you where, and lets you work out
which earlier fopen() you didn't check?

we _do_ have memcpy()-is-memmove() and we _do_ have _FORTIFY_SOURCE=2 by
default, for example. so "yes; to the extent that we can be helpful, we
strive to be helpful".

similar to Larry Wall's "make easy things easy, and hard things possible",
we err on the side of "make common screwups easy to debug, and hard ones
[more] possible".


> > killing a process with a
> > "you called syscall %d and you're not allowed to" message is pretty much
> > exactly what you'd expect from me :-)
>
> Yeah, alex didn't get that.
>
>  > bash-5.2$ set +p
>  >
>  > [Process completed (signal 31) - press Enter]
>  >
>
> Bash catches SIGSYS in interactive shells so it can clean up the terminal
> pgrps and save the history, so the user's not going to see that message
> from bash (assuming you print it from the default signal `handler').


yes, Android installs default signal handlers for all the usual crashes
that will ensure a crash ends up in the log, a more detailed tombstone (if
you're an OS developer or otherwise have access to the file system), and --
if you're a Play app -- you'll get anonymized clustered crash reports in
your developer console, looking something like
https://source.android.com/docs/core/tests/debug/native-crash#seccomp but
without the "we can't _prove_ it's not PII you're trying to exfiltrate, so
we just redact it" register values.

(you can also get hold of tombstones [in protobuf format
https://android.googlesource.com/platform/system/core/+/refs/heads/main/debuggerd/proto/tombstone.proto
] from your last crash via
https://developer.android.com/reference/android/app/ApplicationExitInfo#getTraceInputStream()
if you want to go direct to your own mothership, and have your own privacy
policy in place for such things.)

but, again, that's aimed at apps.


> Bash
> will kill itself with SIGSYS after restoring the default signal handler,
> so it's up to the parent to catch that exit status and report it, which
> termux did. It's not that unusual. You can't count on the user seeing it.
>
>
>
> > it might not seem like it from the outsiders (because you only see the
> > stuff that _does_ break), but i'm sure as the maintainer of something
> old
> > and widely-used you're well aware that "not breaking everyone's stuff
> [even
> > when it's terrible]" is a large part of the job. (and, ironically, i've
> > come to believe that breaking it _loudly and clearly_ when you have to
> [for
> > security, usually] is the least worst option.)
>
> Backwards compatibility is important, and something in which I put a lot
> of value, but you're right, sometimes you have to break it. That's why bash
> has its compatibility mode. But it seems like gradually fixing things, or
> adding a compatibility mode setting, sometimes only gives people the
> opportunity to complain simultaneously about both the problem and the
> solution.
>
> Chet
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>                  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/
>
>
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to