On Fri, Jan 6, 2023 at 5:57 PM Rob Landley <[email protected]> wrote: > Note to self: remember to hit "send". > > On 1/5/23 19:10, enh wrote: > > > even though one _could_ write a byte-by-byte memcmp(), the > standard does not > > > require that, and i'm aware of no non-C implementation that works > that way. > > > > A non-C implementation of a C library function? > > > > well, "assembler" if you must. my distinction being "regardless of > architecture" > > (so not specifically arm64 or whatever). > > Sounds like selection bias to me: no reason to implement an assembly > version > that does the same thing the C version does. > > (All the 1950's cars still on the road today are MUCH more durable than > modern > cars, most of which wouldn't still run after 70 years.) > > > > (musl may have misled you here? > > And uClibc, libc5, klibc, the kernel's nolibc, Keith Packard's picolibc... > > > strictly BSD also has a memcmp.c that's > > > byte-by-byte, but all the real architectures have assembler > versions they use > > > instead.) > > I just checked and current-ish glibc is a horrific nest of #ifdefs in C > with .S > alternative versions for a half-dozen architectures on top of that, yes. > But gnu > was already like that when I first tried to read their code 30 years ago... > > gnu/newlib has an #ifdef SMALL with the simple one, and an #else with the > loop > over the long as a prefix to the loop over char... and I think that > implementation wouldn't break? It will only do the long loop on two aligned > pointers, and only while there are >=sizeof(long) bytes left, meaning it > can run > off the end of the shorter string constant but can't run off the end of the > page. So while it can fetch garbage bytes past the end of the string > within the > same page, that info won't affect the result nor will it page fault. > > It'd still false positive hwsan, of course. Like my ls code did way back > when... > > This was the rough mental model I had of the "optimization" all along, by > the > way. It CAN be done without breaking the semantics. The question is > whether the > constant time check up front and the extra cache line pollution for code > you > jump over a net negative in real world use. It's PROBABLY a wash? I > suspect your > real limiting factor on all this performance is cache line fetches anyway, > what > the CPU does is mostly "wait for DRAM fetch" when handling nontrivial > string > anything. Hence the aggressive prefetching and cacheing leaking security > state > until "do not run security critical code on the same physical CPU as > sandboxed > anything" gives us the nightmare that is TPM. Putting "trusted" on a chip > is > like putting "unsinkable" on a ship. > > > I agree that xmemcmp() is not the ideal name. The x prefix means > "exits", and > > this doesn't. > > > > memscmp() maybe? (memstrcmp?) > > > > safememcmp()? > > Nope. Not calling it unsinkablememcmp() either. (I went with smemcmp(), > you can > decide for yourself what the s means.) > > > > for arm64, the SVE memcmp() will load as many bytes as your vector > size :-) > > > > Which is not optimizing for the common case, but ok... > > > > as a libc maintainer, "don't get me started". the number of times i've > had > > optimized memory/string routines that are improvements for the very > large cases > > that mostly only happen in microbenchmarks while regressing the more > common > > short copies/compares... (though given the arm64 SVE context, i should > say that > > i think "arm ltd" themselves might be the sole exception that's never > wasted my > > time with such a thing.) > > Perceived improvement vs actual improvement. > > If you include "string.h" to get memcmp() but can't give it a string as a > known > not-matching argument, I personally think somebody missed part of their > mission > briefing. The "optimization" has very obvious side effects. >
at the risk of sounding like rich felker ... no, you're relying on undefined behavior. the library function says it compares n bytes of both regions. you lied to the library by claiming that the first n bytes of _both_ regions are valid, when they're not. ironically, _you're_ assuming an "optimization" that it won't look past the first non-matching byte, and you're annoyed that implementations aimed at chips that work well with larger quanta have chosen a different equally valid optimization instead. neither is _wrong_, but they're incompatible, and your mental model is assuming something that the specification doesn't guarantee you. "The memcmp() function shall compare the first n bytes (each interpreted as unsigned char) of the object pointed to by s1 to the first n bytes of the object pointed to by s2." https://pubs.opengroup.org/onlinepubs/9699919799/functions/memcmp.html (and it is enough to matter to performance in practice, not just for stupidly-large regions. bionic wouldn't do this otherwise, because i'd have rejected the patches :-) ) > > Further increasing complexity to mitigate the fallout from a previous > > unnecessary optimization is not my preferred approach, I tend to rip > OUT stuff > > with sharp edges and little to no benefit. But to each their own... > > > > this kind of thing is what lets you do things like adding fake cat eats > to your > > head live when you're recording stupid videos to clog the intertubes > with. > > I very vaguely recall meeting the people making reactive cat ears at the > first > hot chips I attended in tokyo back in... 2015? (There was a pandemic, who > knows.) For a definition of "met" that was "saw model wearing cool thing, > read > the english side of a glossy brochure, everybody at the booth only spoke > japanese", but still. If that's the one, it was a tiny microcontroller. > Battery > powered. Not a bandaid-on-bandaid-on-bandaid situation. > > The bolt-more-on approach piles up Pentium 4 and Itanium and eventually > gets its > legs cut out from under it by a not-doing-that. It can go quite a ways > first, of > course, but Cortex-M is not "more than armv8", it's a subset. > > "Perfection is achieved not when there is nothing left to add, but nothing > left > to take away." - Antoine De Saint-Exupery. Except he said it in french. > > > oddly to you and me, that's an in-demand use case for "real people"... > > My grumbling about perceived improvement vs actual improvement is because I > question and requestion my approach a lot, and a common fallback is "small > and > simple examples that work are seldom actually useless". But that's not the > world > you're in. :) > > Rob >
_______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
