On 1/6/23 20:09, enh wrote: > If you include "string.h" to get memcmp() but can't give it a string as a > known > not-matching argument, I personally think somebody missed part of their > mission > briefing. The "optimization" has very obvious side effects. > > at the risk of sounding like rich felker ... no, you're relying on undefined > behavior.
Then define it. That's basically what I did by making my own function to replace the one in libc. (It's a pity you can't -funsigned-char and friends for libc...) I layer LSB and LP64 and man7.org and gnu/dammit manual pages on top of posix all the time, because they all leave big holes. I use gcc extensions like x ? : y long before the trailing standards body picks them up, and both icc and llvm picked up those gcc extensions without regard to standards bodies. I have a portability.[ch] staircase for known environments. I'm happy to deal with these issues as they arise with (unsigned long) typecasts and so on. But "you can't go beyond the standard" means posix forbids me to implement "mount". How is this categorically different than that? "Rigidly defined areas of doubt and uncertainty" is from: https://www.imdb.com/title/tt1113230/characters/nm0571650 > the library function says it compares n bytes of both regions. you > lied to the library by claiming that the first n bytes of _both_ regions are > valid, when they're not. They are when ASAN isn't enabled. This string was in the middle of an array of strings the linker had no reason to reorder (and keeping them in order is an optimization due to cache locality when iterating over the array), and if I'd really wanted to ensure rodata categorically wasn't suddenly gonna end right after a string constant in any command or lib/ function I could have moved main.c to the end of the link order, and if the linux kernel depending on link order to assemble the device probe array for 20 years no longer held true I could horribly annotate toy_list with magic segment info and (if necessary) linker script nonsense to ensure it always went at the end of rodata. (There are bigger and bigger hammers...) I have a TODO item to find a way to say "const" on things like toy_list that puts them in .rodata instead of .data but doesn't the complain that every over access to them does not say "const". Like the char * you assign a string to doesn't need to be const even though it points to rodata. It's probably some __attribute__((nonsense) I just haven't bothered yet. If ASAN suddenly started complaining about toy_list being writeable and made this an ERROR instead of a warning that I MUST DEAL WITH NOW OR YOU CANNOT CONTINUE... I would be equally annoyed. Warning maybe, but not an _error_. There are many ways I COULD have addressed this, but I prefer to solve things by _removing_ conceptual complexity. Hence a well-defined smemcmp() that does not have any "undefined" miasma hanging around it. xstrncpy() and xstrncat() are sort of conceptually adjacent: libc defaults to truncating and these default to erroring if they would truncate, because "I could not copy the data" is often a _problem_ which should not just silently continue with modified results. The behavior I want is not what libc does, so replace the libc function with my own lib/ function. (I was nice and didn't put smemcmp() in portability.c.) > ironically, _you're_ assuming an "optimization" that it > won't look past the first non-matching byte, I was assuming the _result_ won't be influenced by anything past past the first non-matching byte. "Does not return at all when the result is trivially calculable" strikes me as a bad implementation, so I switched to one that didn't need babying. I admit my vague awareness that data objects tend to get padded up to word size was misleading here, which led me to give it less scrutiny, and that's my fault. But an astronomically unlikely page fault due to running off the rodata segment, which would require a change to the linker to reorder string constants it's adding to the string table, and which I could probably prevent even then by modifying the link order in make.sh so main.c goes last, would deterministically show up in "make tests". (There are many things where I wait for somebody to complain and deal with it then. In this case they wouldn't be providing more info/context than I have now, or a real world use case I don't currently have, so...) Half my annoyance here was looking at bionic and seeing a size comparison so this "optimization with side effects" theoretically isn't being applied to the small strings I was using (at least I'm assuming that's what the constant jump means) and thus can't happen here anyway, but ASAN triggering despite all because honey badger don't care. But as I said, "I'm comfortable with" and "random samsung dev is comfortable with" are two different things, so sharp edge filed off because I'm making code OTHER people need to understand. I'm annoyed the sharp edge is _there_, not that I hit it. I work around libc bugs all the time, and have done so here. The disagreement is that you think they're philosophically right, and I'm treating it as yet another build environment glitch to work around. > and you're annoyed that > implementations aimed at chips that work well with larger quanta have chosen a > different equally valid optimization instead. neither is _wrong_, but they're > incompatible, and your mental model is assuming something that the > specification > doesn't guarantee you. I replaced it with a sane implementation and used that instead. I was mildly annoyed ASAN was triggering on an (in this case) purely theoretically issue as an error not a warning, which if it ever did manifest (due to a future compiler change) would be caught by the test suite. ASAN generated a currently-false positive, and demanded that it be dealt with NOW rather than going on the todo list, because it doesn't have "error" vs "warning" categories. It's mild annoyance, and I did what it demanded already, but any solution where MORE understanding of what's going on is _punished_ gets us back to demanding "rigidly defined areas of doubt and uncertainty", and then you may (or may not) be Vroomfondel. Did I make the change? Yes. Did I admit they were "right"? No. > "The memcmp() function shall compare the first n bytes (each interpreted as > unsigned char) of the object pointed to by s1 to the first n bytes of the > object > pointed to by s2." > https://pubs.opengroup.org/onlinepubs/9699919799/functions/memcmp.html > <https://pubs.opengroup.org/onlinepubs/9699919799/functions/memcmp.html> > > (and it is enough to matter to performance in practice, not just for > stupidly-large regions. bionic wouldn't do this otherwise, because i'd have > rejected the patches :-) ) I'll take your word for it, I don't maintain a libc and if I did it would probably be a nolibc/picolibc variant for QCC, meaning roughly as optimized as tinycc's output... Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
