I was writing this when I got distracted by your new patches, I should finish up and send it...

On 10/18/24 18:08, Ray Gardner wrote:
Is random() not good? Maybe not the best, but what is not remotely about it?
Period about 16*(2**31-1) not enough?

OK, it's not good enough. I don't know how bad random() is aside from
the period, but that is inadequate. I can put in a Bays-Durham shuffle
to fix that. I can also put in my own PRNG in place of random(), but it
seems you would prefer to keep toybox small.

In general if you want real randomness, I ask the kernel for random bits. If you want repeatable prng streams where I can specify a seed to get reproducible behavior, randomly changing the PRNG out from under it is a bad thing.

You seem to be talking about something in between...?

Your patch went from initializing the random number generator from a fraction of
a second to an even second (which is forever in computer timme), and I don't
know why you did that. What's the benefit?

The patch only addresses the seed when no arg is given to srand(). Yes it's
for compatibility,

Compatibility with what?

I should have said compliance -- posix compliance. It's in the spec.
But also compatibility: gawk, nawk, mawk, goawk, busybox awk, all
support it.

To what end? Which of the above two use cases is being supported here? Actual randomness (which the kernel is better at than userspace), or reproducible PRNG sequences from a known seed?

This is the test that tests that srand() is inited to the current
second. srand() returns the previous seed.

And getrandom() doesn't need to be seeded.

(Well, ok, it does but that's not an individual userspace command's problem. That's an init script and hardware config question.)

[...]

Patch 3: You've made your input path pretty complicated to fix an issue that
seems to boil down to "gracefully handle short reads". If getdelim() out of libc
[...]

Eh, applied anyway because it's local to a pending/ command but can't say I'm
happy about it.

I'm not too happy with it either. I'd be happy with some help on the entire
input mechanism. But it's complicated.
[...]
I've lost track of "record separator" vs "field separator" vs "line separator" 
here.

There is no line separator.

So newline is hardwired there?

Records and fields. RS is record
separator, FS is field separator. FS can be a regex. Regex RS
is "unspecified" in posix awk, but every awk supports regex RS.
The RS="" case is multiline records. Runs of blank lines separate
records, with leading and trailing blank lines ignored.

The "interactive" awk input is supported by all awks I test (gawk,
nawk, mawk, goawk, busybox awk).

[...]
But now every awk allows RS to be a regex, which is "unspecified" by POSIX.

Have you found anything using that yet? (Did it break a build script?)

I don't know what uses it, but every awk supports it.

I'm surprised the "one true awk" that Android's been building supports a feature that DIDN'T make it into Posix 2024...

When did busybox add this? Hmmm... (Oh the test cases you'll need: busybox commits 5323af7f5180 af1bd0962527 patched up this logic because it's corner case city...) Looks like it was there when Glenn copied this file into a different directory in 2002 (commit 545106f8db3e) so presumably they copied it from gnu/dammit in the first place. (For all I know their code _was_ from some ancient version of gnu awk, since it was a GPLv2 project at the time...)

I am trying to
be posix compliant (modulo some locale stuff), but also handling
common extensions and non-posix behavior of other awks.

I've completely re-written the wonky record-reading code to be much
cleaner. Was difficult. The regex RS is used to support the multiline
record feature. Is in the next batch of patches.

I deleted my local cleanups to apply your patches. (It's man.c all over again...)

[...]
Anyway, here is a fix and a bunch of tests. BTW only toybox awk and a recent
gawk (5.3.0 in my case) pass all the tests.
>
I'm testing against gawk 5.3.1 now, and it does NOT pass all the
tests any longer. Not sure why it's failing two of the new ones:
"split() utf8" and "split fields utf8". Both do essentially the same
as:

BEGIN{n = split("aβc", a, ""); printf "%d %d", n, length(a);for (e =
1; e <= n; e++) printf " %s %s", e, "(" a[e] ")";print ""}

BEGIN{FS=""; $0 = "aβc"; printf "%d", NF; for (e = 1; e <= NF; e++)
printf " %s %s", e, "(" $e ")"; print ""}

And both of those work in gawk when run directly.

Part of the problem writing tests is figuring out how much to test. I keep having test skew running bash through "TEST_HOST=1 make test_sh" although part of the reason is I keep emailing chet and he FIXES stuff. But if there's no standard and no stable reference implementation the test might not mean anything.

My preferred test case is running as many package builds through it as possible, since that's the real world data. Over in busybox-land alpine linux and friends did that, but toybox wasn't ready when that ecosystem launched and it's not designed to allow drop-in replacement.

I've been vaguely poking at https://wiki.debian.org/HelmutGrohne/rebootstrap and https://forums.debian.net/viewtopic.php?t=66936 and so on, but the debian repo isn't exactly designed to let you do this either. (Erik Andersen did it for busybox+uclibc many moons ago, but that's lost to history and he never went that far...)

My todo.txt says I should watch:

watch (from https://peertube.debian.social/videos/local?s=1)
  Automating architecture bootstrap:
    https://peertube.debian.social/w/45XroN9CnbYLNLKQH3GD9F
  Building a busybox based debian
    https://peertube.debian.social/w/chzkKrMvEczG7qQyjbMKPr
  Android tools BOF
    https://peertube.debian.social/w/n57xBrJnLZQS1L112cV6wi
  spontaneous cross compiling BOF
    https://peertube.debian.social/w/3kJncnbRoXaCS1MSQqRixT
  Running debian desktop in containers
    https://peertube.debian.social/w/gpVkreH3CUXGfLd45c7ULt

And I did watch the "building a busybox based debian" one but his answer was basically "smash the repo dependencies, just tell it to extract these packages into an empty directory with nothing satisfied and then manually fix it up with a text editor and some symlinks"...

(I emailed him before the move and he said he had plans to fix the download links as part of a software upgrade of the site...?)

When gnu adds a new creepy thing nobody else supports yet, I generally wait for
something to break due to its absence rather than jumping through hoops to chase
their taillights. Right now, busybox awk is used to build rather a lot of
packages, and if they haven't made the decision to implement this weirdness than
packages are likely to continue to support NOT doing that...

I'm not doing anything busybox isn't already doing, except I handle utf-8
and have fewer bugs. The tests busybox can't pass are bugs. It has rather
a lot, I think.

Because as far as I can tell busybox's version might be based off a fork of gnu awk from the 1990's? It's hard to tell _where_ it came from (other than the comment at the top of the file which doesn't say how much was original code and how much was repurposed from elsewhere) because the conversion from svn to git seems to have broken this history at commit 545106f8db3e and the old svn is long gone. I could dig through old tarballs, and did for https://busybox.net/~landley/forensics.txt which doesn't mention awk so it probably wasn't there in 0.25...

Ray

Rob
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to