Re: (b)make: Outputting a huge variable value

2024-05-16 Thread Mouse
> Following a discussion (make mdi: shell limit exceeded) on tech-pkg@,
> I keep asking myself whether there's a sensible way to output the
> contents a make variable to a file (or pipe), even if the contents
> exceeds ARG_MAX.

Does "make -V '$VARIABLE'" (or without the $, depending on exactly what
you want) not work?  I must be missing something.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: MLINKS for hoststat.8 purgestat.8 to mailwrapper.8?

2024-05-03 Thread Mouse
> See src/usr.sbin/mailwrapper
> symlinks for hoststat and purgestat to mailwrapper
> [...]
> mailq and newaliases are mentioned in the manpages but not hoststat
> and purgestat.

My guess is, because the latter are specific to a particular mailer.
mailq and newaliases date back to sendmail (or further); many more
recent MTAs provide similar functionality under those names for the
sake of compatibility (mostly, I suspect, with humans).

hoststat and purgestat are not.  I don't know what they do, and I don't
know whether modern sendmail (if there is such a thing - I'm so out of
touch with MTAs these days that I don't even know that) provides them.
But they definitely are more recent and, probably, less universal.
While I'm not qualified to speak with any authority on the matter, I
would guess that those links are shipped because the shipped mailer
provides them.

> default etc/mailer.conf does not have them

Not surprising.  It would make sense to provide links for only those
things that most common mailers provide.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Using mmap(2) in sort(1) instead of temp files

2024-04-05 Thread Mouse
>> (4) Are there still incoherencies between mmap and read/write
>> access?  At one time there were, [...]
> This bug was fixed nearly a quarter century ago, in November 2000,
> with the merge of the unified buffer cache.

Ah, I recall UBC being brought in.

> I think using any version of NetBSD released in this millennium
> should be good to avoid the bug.

For use cases for which such a thing is appropriate, if such a thing
exists, yes, I daresay it would be.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Using mmap(2) in sort(1) instead of temp files

2024-04-05 Thread Mouse
>> [...]
> Why not stat the input file and decide to use in memory iff the file
> is small enough?  This way sort will handle large sorts on small
> memory machines automatically.

Well, I'm not the one (putatively) doing the work.  But my answers to
that are:

(1) Small sorts are not the issue, IMO.  Even a speedup as great as
halving the time taken is not enough to worry about when it's on a par
with the cost of starting sort(1) at all.

(2) Using mmap versus read provides minimal speedup in this sort of
case: a small file which is being read sequentially.

(3) Code complexity: two paths means twice the testing, twice the
opportunities for bugs, (slightly more than) twice the maintenance,
etc.

(4) Are there still incoherencies between mmap and read/write access?
At one time there were, and I never got a good handle on what needed to
be done to avoid them.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Using mmap(2) in sort(1) instead of temp files

2024-04-04 Thread Mouse
>> Given the issues about using mmap, can anybody suggest how I should
>> proceed with the implementation, or if I should at all?

> There are two potential ways where mmap(2) could help improve the speed
> of sort:

>  - If you know the input file name, use a read-only mmap() of that file
>and avoid all buffering.  Downside: you can not store \0 at the
>end of a line anymore and need to deal with char*/size_t pairs for
>strings.

Actually, if you mmap it PROT_WRITE and MAP_PRIVATE, you could go right
ahead.  But that'll cost RAM or swap space when the COW fault happens.
It also works only when the input file fits into VM; to rephrase part
of what I wrote yesterday on tech-kern, sorting a file bigger than 4G
on a 32-bit port shouldn't break.

>  - You use "swap space" instead of a temporary file by doing

>   mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_ANNON, -1, 0);

(well, MAP_ANON).  Yes, but that has issues.  The size of an mmap()ped
area is fixed, set at map time, whereas file sizes grow dynamically.  I
suspect that trying to use mmap instead of temp files would amount to
implementing a rudimentary ramfs.

Furthermore, if the dataset fits in RAM, I'd say you shouldn't be using
the temporary-space paradigm at all; just slurp it in and sort it in
core.  And if it fits in VM but not RAM, given the way swap is tuned
for general-purpose use instead of the kind of access patterns sort
exhibits, I suspect temp files might end up being more performant.  And
if the dataset doesn't fit in VM, you'll need temp files regardless.

If this does go in, I really think it needs an option to suppress it.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Issues with lseek(2) on a block device

2024-02-23 Thread Mouse
>> It looks to me like "we didn't bother making it do anything in
>> particular, so you get whatever it happens to give you".
> "bug" ultimately means "failure to conform to expectations".

Well...maybe.  Depends on whose expectations.  If I expect, say, that
typing a tab on a command line puts a tab into the command, and someone
else expects it to do filename completion, which one is the bug?

Usually, I use "bug" to mean something close to what you said, where
the answer to "whose" is "the author's".

> We could debate what expectations might be for stat and block device
> sizes, but it is definitely against expectations that something so
> simple as retrieving the size of a storage device has such a messy
> interface.

Again, whose expectations?

My expectations, formed from decades of experience, are that it _is_ a
messy thing.

At least one person clearly expected that lseek would do the trick.
(Interestingly, I would not have even considered lseek; if I'd had to
pick a stock call that I would expect to return the size of a disk, it
would be stat() or one of its variants - lstat, fstat, etc.)

>> My own method of finding disk device size is [...].  I'd expect it
>> to be highly portable.  The only cases I'd expect it to fail in are
>> disks over 4G (or perhaps 2G) on systems with only 32 bits for
>> off_t.

> ...or tapes.

Or ttys.

Tapes don't really have a size in that sense to obtain.  (Most tapes.
Some DEC tapes do look a lot disks with sec/trk and trk/cyl both 1 and
what for disks would be _extremely_ long seek times.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Issues with lseek(2) on a block device

2024-02-22 Thread Mouse
>>> lseek(fd, 0, SEEK_END);
[on a disk device]

>> [...]
> [...]
> This is such a buggy behaviour that [...]

I wouldn't call it buggy, not unless there is a spec that it's supposed
to conform to that says otherwise (even if the "spec" is just an
author's description of intent), which is something I so far haven't
seen reason to think exists.  It looks to me like "we didn't bother
making it do anything in particular, so you get whatever it happens to
give you".

> I stopped using stat for finding out block device size a long time
> ago, and not just on NetBSD.

That sounds sensible to me.  I think your first mail on the subject was
the first time I'd even _considered_ that stat() applied to a disk
device might return drive/partition size.

My own method of finding disk device size is to lseek to offset N and
try to read one sector, for various N: initially N = sector size, then
double N until I get EOF-or-error, then do binary search between the
last working value and the first failing value.  I think that's worked
on everything I've tried it on; admittedly, that's only a few OSes, but
I'd expect it to be highly portable.  The only cases I'd expect it to
fail in are disks over 4G (or perhaps 2G) on systems with only 32 bits
for off_t.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: -mdoc: how to handle surrounding types?

2024-02-09 Thread Mouse
>> I'm writing a manpage for a library and want to describe a function
>> which returns a pointer to a function.

>> extern void (*foo(void (*)(const char *, int)))(const char *, int);

[RVP]
> After a bit of fiddling with signal.3:

Doh!  Of course signal(3) is an example.

> +.ds cm \f[R],\f[]
>   .Ft void \*(lp*
> -.Fn signal "int sig" "void \*(lp*func\*(rp\*(lpint\*(rp\*(rp\*(rp\*(lpint"
> +.Fn foo "void \*(lp*\*(rp\*(lpconst char *\*(cm int\*(rp\*(rp\*(rp\*(lpconst 
> char *\*(cm int"

Hm.  I'll have to play with it a bit, look at what gets generated for
the various versions I care about.

This needs thought.

[Rhialto]
> However I would prefer that it would make a typedef for the signal
> handler type, and use that both as argument for signal() and its
> return type: it us much more readable.

But it also requires dumping yet another typedef into the caller's
namespace.

I also dislike such typedefs in general, though I'm having trouble
codifying what it is that bothers me about them as compared to the
typedefs I do use.

Now I have two things to think about here.  Thank you both!

> \X/ There is no AI. There is just someone else's work.   --I. Rose

Becoming less true, to the extent that weak AI qualifies.

Reminds me of "Don't say `the cloud'.  Say `someone else's computer'."

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


-mdoc: how to handle surrounding types?

2024-02-08 Thread Mouse
I'm writing a manpage for a library and want to describe a function
which returns a pointer to a function.

Specifically, it takes a handler of type void (*)(const char *, int)
and returns the previous handler.  If we call it foo, then:

extern void (*foo(void (*)(const char *, int)))(const char *, int);

How should this be described in SYNOPSIS?  The .Ft/.Fn paradigm does
not seem to me to have any obvious way to describe function return
types that aren't textually entirely before the function name, that is,
which don't fit the textual paradigm $RETURNTYPE $FUNCTIONNAME($ARGS).

I tried

.Ft void
.Fn ( * foo "void (*)(const char *, int)" ) "const char *" "int"

but that failed rather badly, worse than I was expecting; I got

void
 (*(foo, void (*)(const char *, int));) const char * int

with assorted portions highlighted.

What's the Right way to write such a thing?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: strtoi(3) ERANGE vs ENOTSUP

2024-01-20 Thread Mouse
> PR lib/57828 (https://gnats.NetBSD.org/57828) proposes changing an
> edge case of the strtoi(3) function so that if ERANGE and ENOTSUP are
> both applicable then it should fail with ERANGE rather than ENOTSUP.

Has it ever been part of strtoi's contract which error it fails with if
multiple errors are applicable?  What you wrote doesn't, to me, sound
as though there is any such spec.

If there is, that's the place to start, it seems to me.

If not, then anything this change affects is already broken and
deserves to be rendered _obviously_ broken, so there's some hope of
getting it fixed.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: [patch] cat -n bug from 33 years ago

2023-11-18 Thread Mouse
>>> Numbering currently starts over at 1 for each input file [...]
>> If you want to have continuing (non-restarting) numbering for
>> multiple input files, one could use "cat file1 file2 | cat -n".
> True, that would be a workaround.

> But shouldn't the current behaviour still be fixed?

First, I think, we should decide whether "fixed" is an appropriate
word.  Perhaps it's just me, but I don't consider Linux cat to be a
reference implementation.  (Indeed, I don't consider the Linux
implementation of pretty much _anything_ to be a reference, except for
Linux-specific things.)

> The restarting for each file has never been mentioned in the manual
> as a feature,

The manual is semi-silent on it.  It's not specifically described, but
-n is documented as numbering "the output lines", which I think
indicates the numbering should continue.

> and it isn't what most people would expect.

I'm not so sure.  Without having seen this thread or the exact wording
in the manpage, I'm not sure which behaviour I would have expected.
(Not that I'm `most people', of course, but, as bad as a sample of
size 1 is, it's better than a sample of size 0.)

There _is_ the point that, with the restarting behaviour, there is a
simple workaround if you want the other behaviour; working around in
the other direction is harder - instead of two cat processes you need
one process per input source, plus one shell (or other overseer) to
reap finished ones and start new ones.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: [patch] cat -n bug from 33 years ago

2023-11-15 Thread Mouse
> The attached patch fixes a bug in /bin/cat when using -n with
> multiple input files.

While someone's meddling with src/bin/cat

The manpage language does support this, but it could be clearer; I'd
prefer to see an explicit statement added that the numbering continues
across multiple input files.  As possible wording:

.It Fl n
Number the output lines, starting from 1.  Note that it does not matter
whether the output lines come from multiple files or not.

Two other things occur to me, on reading the manpage:

-b is documented as not numbering `blank lines'.  It is not clear from
the manpage whether this means "lines containing nothing at all" or
"lines containing nothing except possibly whitespace".  The code does
the former; assuming that's what you want, I'd suggest s/blank/empty/.

-s is documented as "Squeeze multiple adjacent empty lines, causing the
output to be single spaced.".  The part before the comma is fine, but
the part after seems wrong to me; `single spaced', to me, means `no
blank lines except between paragraphs', which is not what cat -s does.
Again, a possible wording:

.It Fl s
Squeeze multiple adjacent empty lines into a single empty line.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: tar vs device special files

2023-10-29 Thread Mouse
>> It appears to be specifying pax's behaviour, not tar's.  [...]
> POSIX used to specify tar, long ago, but there were (as I understand
> it) too many incompat variants, so it was dropped.

Not entirely surprising.

> You should have been expecting that as the link you were given ended
> in pax.html#some-tag-or-other

Yes, I noticed that...after the fact.

I got the file, thank you!

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: tar vs device special files

2023-10-28 Thread Mouse
>> So there _is_ a POSIX spec for tarchives?  [...]
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06

I got that fetched and have been going through it.

It appears to be specifying pax's behaviour, not tar's.  Is tar
specified to use the same format by reference, or is tar not specified
but everyone just implements it to use pax's ustar format, or what?

It also seems to me that significant fractions of it are
unimplementable on NetBSD because they demand recoding to or from UTF-8
for things that NetBSD handles as octet strings, not character strings
(which therefore cannot be recoded to or from UTF-8 even in principle),
such as user names in the system user database (/etc/{master.,}passwd
for NetBSD).  Is there a canonical way of handling such things?

passwd(5) on 9.0 and on 5.2 specify that /etc/passwd contains ASCII
records, but 5.2 vipw does not complain when I put a 0xe5 octet in a
record's username and homedir fields - and it appears to work just
fine, so any such restriction is not enforced.  This means software has
to do _something_ with faced with such things.  Is there some kind of
system-wide locale setting used for non-user-specific things like
usernames, or what?  I'm moderately sure there isn't any such thing on
5.2 and earlier.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: tar vs device special files

2023-10-28 Thread Mouse
>> So there _is_ a POSIX spec for tarchives?  [...]
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06

I'll have to scare up a work machine to fetch that from, since
apparently pubs.opengroup.org is not interested in serving content over
HTTP.  But that should be doable; work these days tends to inflict
recent Linux on me, and, as unpleasant as I find that for most
purposes, it does mean things like curl with HTTPS support.

Thank you!

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: tar vs device special files

2023-10-28 Thread Mouse
> I don't think any one else cares about pre-ustar.  Pretty much any
> reader and writer around uses at least ustar and generally wants to
> have extended POSIX as well when caring about large files.

So there _is_ a POSIX spec for tarchives?  Is the spec available, or is
this yet another pay-to-play "standard"?  I've gone looking for specs
for tar before, but each time I have, I've been unable to find anything
that isn't behind a paywall of one sort or another (and thus a total
nonstarter for me).

Admittedly, I haven't looked recently.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: tar vs device special files

2023-10-28 Thread Mouse
>> (It doesn't help that I haven't managed to find a clear spec for tar
>> format; the closest I've found so far is a description of what pax,
>> in its (supposedly-)tar-compatible mode, is supposed to read/write.)
> All of this can be found in:
> src/external/bsd/libarchive/dist/libarchive/archive_read_support_format_tar.c

Thank you!  I'll have a look.

> If the libarchive tar doesn't see a "ustar  \0" (GNU tar) or "ustar"
> (POSIX tar) magic at 0x101 (see: tar_read_header()), it take the file
> to be a non-POSIX old-style tar archive which (according to
> libarchive) doesn't store maj./min. nos. (see: struct
> archive_entry_header_ustar)

That is ... a significant deviation from historical practice, to the
extent that I would call it a bug in libarchive's tar support.  (I
don't think I've ever stumbled across any other tar that didn't
understand mtar's archives, though admittedly I don't pass archives
including device special files between implementations very often, so
if the incompatibility is limited to them I might well not notice.)

> Maybe your tar could supply a "ustar" magic char. seq. at 0x101 for
> libarchive.  (see: header_ustar() vs. header_old_tar())

I'll read the file you pointed at (though the path makes it sound like
a description of what libarchive chooses to do rather than anything
authoritative, though admittedly I don't know whether there _is_
anything authoritative when it comes to tar in general, as opposed to
specific tar implementations).

> Or, fix libarchive like this: [...]

If this isn't just a NetBSD oddity, I'd prefer to generate archives
that are more widely compatible.  Maybe even if it is.  Either way,
fixing libarchive is counterindicated (unless NetBSD is willing to take
up the changes, which strikes me as unlikely).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


tar vs device special files

2023-10-28 Thread Mouse
I ran into an issue with tar, recently, on a NetBSD 9.1 system.  I
created a tarball, containing device special files, with my tar and
then extracted it with the system tar.  The device special files all
ended up with major and minor numbers 0,0 on extract.

Looking at the tarball, I'm having trouble seeing what could be
responsible.

I created two test tarballs, containing just rwd0d from /dev, one with
my tar and one with the 9.1 tar.  Each one, of course, reads its own
tarball just fine, though the output format differs.  (tar listings
below are generated with $PROGRAM tvf -, with the tarball on stdin.)

Mine:

rw-r-  0/50 Sep  2 12:58 2019 rwd0d character special device (3,3)

9.1 /bin/tar:

crw-r-  0 root   operator  3,3 Sep  2  2019 rwd0d

Mine reads the OS's tarball fine as well

rw-r-  0/50 Sep  2 12:58 2019 rwd0d character special device (3,3)

but the OS tar doesn't read my tarball correctly:

crw-r-  0 0  5 0,0 Sep  2  2019 rwd0d

I'm having trouble seeing what's responsible, and in particular am
wondering whether this is my bug or /bin/tar's bug or what.  (It
doesn't help that I haven't managed to find a clear spec for tar
format; the closest I've found so far is a description of what pax, in
its (supposedly-)tar-compatible mode, is supposed to read/write.)

The 9.1 /bin/tar tarball (hexdump -C) is

  72 77 64 30 64 00 00 00  00 00 00 00 00 00 00 00  |rwd0d...|
0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0060  00 00 00 00 30 30 30 36  34 30 20 00 30 30 30 30  |000640 .|
0070  30 30 20 00 30 30 30 30  30 35 20 00 30 30 30 30  |00 .05 .|
0080  30 30 30 30 30 30 30 20  31 33 35 33 33 32 34 35  |000 13533245|
0090  30 37 34 20 30 31 32 36  35 35 00 20 33 00 00 00  |074 012655. 3...|
00a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0100  00 75 73 74 61 72 00 30  30 72 6f 6f 74 00 00 00  |.ustar.00root...|
0110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
0120  00 00 00 00 00 00 00 00  00 6f 70 65 72 61 74 6f  |.operato|
0130  72 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |r...|
0140  00 00 00 00 00 00 00 00  00 30 30 30 30 30 33 20  |.03 |
0150  00 30 30 30 30 30 33 20  00 00 00 00 00 00 00 00  |.03 |
0160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
2800

whereas mine is

  72 77 64 30 64 00 00 00  00 00 00 00 00 00 00 00  |rwd0d...|
0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0060  00 00 00 00 30 30 30 36  34 30 20 00 30 30 30 30  |000640 .|
0070  30 30 20 00 30 30 30 30  30 35 20 00 30 30 30 30  |00 .05 .|
0080  30 30 30 30 30 30 30 20  31 33 35 33 33 32 34 35  |000 13533245|
0090  30 37 34 20 30 30 36 37  35 36 00 20 33 00 00 00  |074 006756. 3...|
00a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0140  00 00 00 00 00 00 00 00  00 30 30 30 30 30 33 20  |.03 |
0150  00 30 30 30 30 30 33 20  00 00 00 00 00 00 00 00  |.03 |
0160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
2800

Except for the stuff at offsets 0x100-0x131, they look pretty close to
identical to me (the value at 0x94 is the header checksum), and that
stuff is, as far as I can tell, owner name strings (which I'm not
supplying, just using the numeric uid and gid values).  But the stock
9.1 tar seems to be taking the 03 major and minor numbers as zero
for reasons I don't understand, since it understands its own,
apparently identical, major and minor numbers just fine.

Any ideas?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: lseek on tty

2023-09-18 Thread Mouse
>>> On NetBSD and macOS, lseek(2) on a tty succeeds:
>>> if (lseek(STDIN_FILENO, 0, SEEK_CUR) == -1 )

>>  Some devices are incapable of seeking.  The value of the
>>  pointer associated with such a device is undefined.

>> So I'm guessing it depends on whether you think of adding something
>> to an undefined value makes sense.

> Kinda feel this makes a successful return value of lseek(2) quite
> unreliable: it could either mean that lseek(2) was successful, and
> has updated the offset, or it could mean ":shrug: can't seek, you are
> wherever you were before calling me".

Well, I suspect that, as =?utf-8?B?0L3QsNCx?= suggested upthread, it
_did_ update the offset, but the offset doesn't affect anything on a
tty.  I ran into a related case back in the day on SunOS, with 32-bit
offsets: after writing 2G of data to a tape, it would abruptly start
failing, apparently because the offset wrapped around and went
negative.  Adding an explicit lseek(,0,SEEK_SET) every once in a while
cured it.  Because tapes, like ttys, don't do anything with the offset,
all this did was shut off the spurious error.

> Wouldn't it be more helpful to the user to have lseek(2) return -1
> here?

Would it?  I'm not sure.  What user would be helped by getting an error
(and which error?) instead of success?  You say Linux fails, but I
don't see compatibility with Linux as enough of a Good Thing for that
alone to be grounds for changing, especially since you say macOS also
accepts it and thus reasonably portable code has to be prepared for
either behaviour.

> Since it's undefined, doing so is equally as valid (as far as the
> standard is concerned).

The value of the offset is undefined.  That is not to say that erroring
versus not erroring is undefined.  (It may be, but your unqualified
"it's undefined" appears to be referring back to my quote, in which
case it's talking about just "the value of the [seek] pointer", not
whether or not lseek itself fails.)

As for standards...which standard?  The only one I know of which I
think is likely to say anything about this is POSIX, which as far as I
can recall I don't have a copy of.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: lseek on tty

2023-09-18 Thread Mouse
> On NetBSD and macOS, lseek(2) on a tty succeeds:

> if (lseek(STDIN_FILENO, 0, SEEK_CUR) == -1 )

> On Linux, this fails.

> I'm trying to think of why seeking on a terminal would make sense.
> Anybody have an idea?

Quoting from lseek(2) on a handy machine:

Some devices are incapable of seeking.  The value of the
pointer associated with such a device is undefined.

So I'm guessing it depends on whether you think of adding something to
an undefined value makes sense.  To me, it does; the resulting value is
undefined, but the seek pointer is undefined anyway.

Also, I'd argue that lseek(,0,SEEK_CUR) makes sense on any descriptor,
since it's not trying to change anything.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: short reads on unix stream on netbsd-10

2023-09-06 Thread Mouse
> [...] for perfused(8) a few years ago.  See section 3.3 of my
> EuroBSDcon paper if you are curious of the details.
> http://hcpnet.free.fr/pubz/fuse.pdf

Heh.  Reading that reminds me of my own run-ins with puffs.  I
implemented a gitfs (present git repos as a filesystem) using puffs,
only to find it deadlocking the machine.  It wasn't quite your
deadlock, but was related; my deadlock looked more like, puffs request
leads daemon to fork git, git tries to access (local FFS) filesystem,
FFS code wants a new vnode, VFS code tries to clean a vnode and picks a
puffs vnode (which the daemon can't do anything with because it's
waiting for git).  I then made the daemon more asynchronous; then, it
wound up locking up the machine in a more obscure way I still don't
really understand.

I've given up on puffs (though it sounds as though your work improved
it significantly in these regards - I was/am using the 5.2 version);
one of my back-burner projects is to build a library that, essentially,
reimplements parts of the filesystem machinery (notably mount points)
in userspace, so my HTTP and FTP daemons can present something that
looks like a gitfs mount point to their clients.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: short reads on unix stream on netbsd-10

2023-09-06 Thread Mouse
> serv_input: read@3 need 468 got 396 (and disconnects).

> The code is:
> if (i != (nchars = read(fd, T.c, i))) {
>   fprintf(stderr, "serv_input: read@%d need %d got %d\n",
>   fd, i, nchars);
>   server_died();
>   }

> fd 3 is a unix stream socket
> I didn't find any place where this file descriptor could have been
> put to nonblocking.

> I could wrap the read in a loop to work around this, but I wonder if
> it's expected behavior to get short reads in this case ?

That's a good question.  In general, SOCK_STREAM socket connections
never promise atomicity of anything larger than one byte.  It's
possible AF_LOCAL provides a stronger promise, but, if so, I'm not sure
where that promise is documented, or how widespread it is.

As for it being _expected_ behaviour...I wouldn't call it "expected",
exactly, but I would call it "not particularly surprising".

So, I'd say the code you quote is at fault here.

If the fd is _always_ an AF_LOCAL/SOCK_STREAM socket, instead of
writing a loop, you could use recv() with MSG_WAITALL.  I use that in a
few places where I want "fill my whole buffer, looping as needed"
semantics; indeed, one relatively common idiom in my code uses an
AF_LOCAL/SOCK_STREAM socketpair instead of a pipe specifically so it
_can_ use MSG_WAITALL.  (In that idiom, it's transferring very little
data - zero bytes except in rare error cases - so performance is not an
issue, and it simplifies the code significantly.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: bin/57544: sed(1) and regex(3) problem with encoding

2023-08-30 Thread Mouse
> This whole "i18n" and "l10n" is a nightmare---and this is a not
> english native speaker who writes it...

And as a native anglophone - who knows a smattering of assorted other
languages - I agree.

I just recently ran into an occasion where something actually got me to
send mail to a domain whose mail was hosted by Google.  I sent it as
8859-14, because it involved a small amount of text in one of the
Gaelic dialects and I prefer to use seanċló when I can.

The text included a ċ.  But apparently, despite my marking it as
8859-14, by the time it got displayed (in their webmail interface, I
think), it had been converted into U+0104, LATIN CAPITAL LETTER A WITH
OGONEK, rather than the correct mapping, U+010B, LATIN SMALL LETTER C
WITH DOT ABOVE.

So I sent a test mail, containing each of the accented vowels and each
of the dotted consonants (well, most of them; I forgot Ṫ and ṫ, but
that's minor).

That mail, for all that it was also marked as being 8859-14, got
displayed as if it were 8859-1.

Not even Google, apparently, can get it even vaguely right.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: epoll exposure

2023-08-16 Thread Mouse
>> It also is a wrong way to build self-configuration; such a test is
>> vulnerable to both false positives and false negatives.  It should
>> be reported upstream as a bug.  Much righter is to test whether
>> epoll, if present, produces the behaviour the program expects in the
>> uses it makes of it.

> As Linux introduced epoll (or so I think) I think it's appropriate --
> absent a SUS specification -- to assume it works as under Linux?

Probably.  But that wasn't what was being suggested, at least not as I
read it.  Here's fuller context:

>>>>> The problem is third-party software assumes epoll == Linux,
>>>> Software that makes stupid assumptions will never go away.
>>>> Is it better to work around it (not ship epoll.h), or to get it
>>>> fixed (report it upstream as the bug it is)?  [...]
>>> I don't really see it as a bug.  You'd have to have all those
>>> problems have configure logic that says

>>>   if we find an epoll implementation, then we have a list of
>>>   operating systems that have implemented an epoll that has
>>>   different semantics and we have to reject it

>>> It seems far more reasonable to say that if an OS implements a
>>> different epoll, then it should call it something else.

>> [...]  It also is a wrong way to build self-configuration; [...]

What I was arguing is not "NetBSD should have epoll with different
semantics" but "the problematic programs would have to have a configure
test with a blacklist of OS/version pairs".  Blacklisting by OS/version
is what I was arguing against.

> How would you argue if some other OS was to introduce something
> called kqueue with semantics different from FreeBSD?

I would still say that a configure test that blacklisted them by
OS/version is a broken test.  I say it should either blindly assume the
semantics it expects or it should test for the semantics it cares
about, depending on the philosophy stance its authors prefer.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: epoll exposure

2023-08-14 Thread Mouse
>>> The problem is third-party software assumes epoll == Linux,

>> Software that makes stupid assumptions will never go away.

>> Is it better to work around it (not ship epoll.h), or to get it
>> fixed (report it upstream as the bug it is)?  I could argue that
>> either way.

> I don't really see it as a bug.  You'd have to have all those
> problems have configure logic that says

>   if we find an epoll implementation, then we have a list of
>   operating systems that have implemented an epoll that has different
>   semantics and we have to reject it

That's not about assuming "epoll == Linux".  That's about assuming
"epoll, if present, has exactly Linux's epoll semantics".  While it's
possible that either would break on the epoll under discussion, they
are not equivalent assumptions in general.

It also is a wrong way to build self-configuration; such a test is
vulnerable to both false positives and false negatives.  It should be
reported upstream as a bug.  Much righter is to test whether epoll, if
present, produces the behaviour the program expects in the uses it
makes of it.  (Also, "Linux" is not a single thing, so "epoll == Linux"
cannot be a correct thing to assume even conceptually.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: epoll exposure

2023-08-14 Thread Mouse
> The problem is third-party software assumes epoll == Linux,

Software that makes stupid assumptions will never go away.

Is it better to work around it (not ship epoll.h), or to get it fixed
(report it upstream as the bug it is)?  I could argue that either way.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: system(3) semantics when SIGCHLD is SIG_IGN'd

2023-08-12 Thread Mouse
> What should system(3) do when the signal action for SIGCHLD is
> SIG_IGN, or has SA_NOCLDWAIT set?

Exec GNU emacs running the Towers of Hanoi? :-)

Figuring out whether you (=NetBSD) want it to "work" is, it seems to
me, the first priority.  I'm not sure where I fall on that question.
On the one hand, having system(3) work in more programs is arguably a
win.  On the other, having nonportable code rendered obviously broken
is _also_ arguably a win.

> If the calling process has SA_NOCLDWAIT set or has SIGCHLD set to
> SIG_IGN, [...]

Check my understanding: this applies to wait(2), but not alternatives
like waitpid(2) or wait4(2), right?

Then, if you want system(3) to "work", maybe it would be better to
have it use something like one of those to wait for exactly and only
the child it cares about, and, if that does the wrong thing when
SIGCHLD is ignored or SA_NOCLDWAIT is set (I haven't checked), either
change it or provide a flag bit in the options argument to change it?

> So, should we do anything about this in system(3)?

> Cons:
> [...]
> - Changing signal actions has the side effect of clearing the signal
>   queue, and I don't see a way around that.

But is changing the signal action the only way to make system(3)
"work"?  I'm not convinced it is.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: /etc/services losses

2023-08-03 Thread Mouse
>>> Hm, but especially :25 was traditionally used by MUAs, no?
>>> Who used the submission port ~a decade ago?
>> User agents.  [...]
> Well ok submission (not -s) seems to have been invented (RFCd) in
> 1998.  BUt i am very sure i have submitted mails via MUA to port 25
> some time in the past.

Oh, certainly.  So have I.

But when doing that either it's a full-fledged message or the SMTP
server in question is being nice and waving off any violations
(possibly for everyone, possibly just for configured-as-local clients).

Anyhow, this is getting off-topic for /etc/services.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: /etc/services losses

2023-08-03 Thread Mouse
> Hm, but especially :25 was traditionally used by MUAs, no?
> Who used the submission port ~a decade ago?

User agents.  I'm pretty sure submission was a thing _two_ decades ago,
though IIRC not all that big a thing.  (Mind you, it may not have been
on the port it's on today, but there was something SMTPish that was
designed for MUA submissions, on a separate port.)

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: /etc/services losses

2023-08-03 Thread Mouse
> I'm also not sure it matters if a TLS session is preceded by the ten
> bytes `STARTTLS\r\n' on the wire or not.

I would say it does.

In theory, it matters because the conversation is not conformant to the
protocol otherwise; a receiver-SMTP would be entirely justified in
dropping a connection which attempts to start a TLS session without
STARTTLS, and, while I don't have specific knowledge of any (I don't
use TLS), it would surprise me if there weren't implementations that
did.  (Playing fast and loose with standards conformance is in large
part how email became the disaster it currently is; doing so more just
makes it worse.)

In practice, I would say it does because the probable failure modes
when talking to a non-TLS-ready MTA are substantially better.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: sed(1) and LC_CTYPE

2023-07-26 Thread Mouse
> $ export LC_CTYPE=fr_FR.ISO8859-15

> $ echo "éé" | sed 's/é/\/g'
> sed: 1: "s/é/\/g": RE error: trailing backslash (\)

I agree that's broken.

> Since, to my knowledge, we do not support anything via iconv or
> whatever, shouldn't we assume simply a string of bytes \`a la C, that
> is:

Seems to me there's a deeper problem.  Even if something like iconv
_were_ available, fr_FR.ISO8859-15 is a single-octet character set, so

> - (void) setlocale(LC_ALL, "");
> + (void) setlocale(LC_ALL, "POSIX");

should, it seems to me, make no difference.  Am I misunderstanding?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: DRM/KMS

2023-07-06 Thread Mouse
> 4. And of course, GPU is hard.  [...]

This is true.

I know someone who works on GPU support _with_ the benefit of NDAed
documentation.  He says typical GPUs have documentation errata lists
substantially longer than the documentation itself (as in 400 pages of
doc, 900 pages of errata).

That said, if tlaronde@ is willing to try, I say go for it.  In case of
failure, NetBSD is no worse off; in case of success, substantially
better.  And there's enough track record there that I don't consider it
nearly as unlikely to succeed as I would for most people (myself
included).

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Trivial program size inflation

2023-07-04 Thread Mouse
> No, ld reads the archive in its sequence in the arg list [...] then
> when it reaches the end of the .a file, it is done, and nothing will
> ever return to it again (the .a can be included on the link command
> line more than once - about 1 in 10,000,000 cases [statistic with 0
> supporting evidence] that might do something useful).

Even as long ago as 1.4T cc runs ld (well, collect2) with -lgcc -lc
-lgcc, which presumably it wouldn't do without a reason.  (5.2 uses
-lgcc -lgcc_eh -lc -lgcc -lgcc_eh, at least in a simple test I just
did.)

How necessary this is, that's a different question.  Not when linking
null.o into an executable on 5.2; I tried running the ld command with
the second copies of -lgcc -lgcc_eh deleted and it linked just fine.
I've failed to find where the second -lgcc -lgcc_eh is specified,
though (I thought there may be explanatory comments); I searched
/usr/src for -lgcc_eh and found only three hits, none of which appear
to specify the duplication.  I may take a closer look sometime.

Looking at libgcc.a's symbol tables, I see something called _eprintf.o
which refers to stdio things (__sF, abort, fflush, fprintf), and
numerous things refer to abort.  libgcc also provides things like
__fixunssfti which look to me like the backend for compiler-internal
references.  Those might between them explain -lgcc -lc -lgcc.

At least one linker has a way to specify multiple .a libraries which
are, as a set, searched iteratively until they no longer resolve
anything more.  I speculate that the reason cc doesn't use that is that
it wants to be able to work with linkers that don't have any such
mechanism.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Trivial program size inflation

2023-07-04 Thread Mouse
> Can't say I understand why ld(1) behaves this way, though.  (I'm
> pretty sure I ran ranlib too.)  It should've noticed _all_ the
> functions in the supplied archive, right?

Not if there were no unsatisfied references to them at the time it gets
to that .a in the list.  With .a archives, the unit of inclusion in the
link is the object file within the archive, in contrast to .so
archives, where it's all or nothing.  Once it's done bringing in
everything from libgnumalloc.a (to use the example I cited) that was
undefined at that point in the link (given how libgnumalloc keeps
calloc off in a file of its own, that did not include any .o defining
calloc), libgnumalloc.a is no longer available to resolve undefined
references.  Then, when a .o file from libc.a refers to calloc, a
symbol that isn't yet present, the only version available to satisfy
the reference is libc's own implementation.

However, depending on the details of how the dynamic linker and the .so
builder work, it may not matter.  If we build it dynamic and use
-lgnumalloc, yes, it will bring in all of libgnumalloc.so, but it will
also bring in all of libc.so.  Whether it will actually work depends on
whether libc referring to a symbol defined elsewhere in libc but also
defined in a different .so already included resolves to the libc
version or the other library's version.  (Or, if that condition is an
error, it may error instead of even trying to work.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: inetd(8): security considerations

2023-07-03 Thread Mouse
> [...], putting the file only under root writability is a safety
> precaution too (against one's own blunders).

> There are pros and cons either way---meaning that, you are right, it
> has to be configurable; remains the question of: what should be the
> default?  Strict or not?

Hmm.  For NetBSD's purposes, I think I would default to strict.  The
default should be appropriate for naïve admins, and that's the strict
form.  The less naïve admins can check the manpage and add a flag, or
add a declaration before including, or whatever it is, to support
whatever config they want.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: inetd(8): security considerations

2023-07-03 Thread Mouse
> There is one more thing I'd be inclined to add: when _serving_ a
> config as root[*], error if the configuration (including sourced
> chunks) is writable by someone else than root.

> What do you think?

A reasonable thing if it's an overridable default.  An extremely
annoying thing (albeit only occasionally) if it's non-overridable.

Also, I'm not sure how I'd modify that if the UID it's serving as is
someone other than root.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Trivial program size inflation

2023-07-02 Thread Mouse
> I'm sorta curious why linking against one of our small malloc
> implementations still pulls in jemalloc:

Using my shell on ftp.n.o (9.0_STABLE):

cc -v says the ld line for cc -o null null.c -lgnumalloc is

ld -plugin /usr/libexec/liblto_plugin.so -plugin-opt=/usr/libexec/lto-wrapper 
-plugin-opt=-fresolution=/tmp//ccCNGMXD.res -plugin-opt=-pass-through=-lgcc 
-plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -dc -dp -e _start 
-static -o null /usr/lib/crt0.o /usr/lib/crti.o /usr/lib/crtbeginT.o 
/tmp//ccU1uPbn.o -lgnumalloc -lgcc -lc -lgcc /usr/lib/crtend.o /usr/lib/crtn.o

Expecting that LTO would be unnecessary here, I scrapped all that and
manually ran it with -Map= to generate a link map, which among other
things describes why each archive member is brought in:

% ld -Map=null.map -dc -dp -e _start -static -o null /usr/lib/crt0.o 
/usr/lib/crti.o /usr/lib/crtbeginT.o null.o -lgnumalloc -lgcc -lc -lgcc 
/usr/lib/crtend.o /usr/lib/crtn.o

Looking at the resulting null.map, I see, among many other lines,

/usr/lib/libc.a(jemalloc.o)   /usr/lib/libc.a(tls.o) (calloc)

which makes sense: if nothing in null.o, crt0.o, crti.o, or crtbeginT.o
refers to anything in the libgnumalloc file containing calloc, nothing
will have brought it in from libgnumalloc.  Then, when libc refers to
it internally, -lgnumalloc is past and thus unavailable for resolving
it, so it comes from the default malloc in libc.

So I tried, instead,

% ld -Map=null.map -dc -dp -e _start -static -o null /usr/lib/crt0.o 
/usr/lib/crti.o /usr/lib/crtbeginT.o null.o --whole-archive -lgnumalloc 
--no-whole-archive -lgcc -lc -lgcc /usr/lib/crtend.o /usr/lib/crtn.o

to force all of libgnumalloc to be brought in.  Sure enough, this time,
"je" does not appear in the link map, and the executable is
significantly smaller.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Trivial program size inflation

2023-07-02 Thread Mouse
> The CSU code has pretty much no idea on what the rest of the world is
> going to do.  It does: [...]

> There is no way with ELF to decide at link time which of those
> features are used by the program and therefore no way to remove any
> of them.

I don't think that's true.

If the program dynamically loads code, yes, it's true (or close
enough).  But, for a statically linked program (as pointed out
upthread, dlopen doesn't work when linked static), it can be done, and
with quite low otherwise-useless overhead.

Using pthreads as an example, consider this.  All pthreads .o modules
refer to (not "call") a specific function.  For concreteness, let's
call it _pthreads_csu_init.  _pthreads_csu_init is off in its own file;
that routine does the CSU initialization for pthreads.  The CSU code
makes a weak reference to _pthreads_csu_init, calling it only if the
weak reference was satisfied by a real definition.

Of course the dynamically-linked form of pthreads would always define
_pthreads_csu_init - or, alternatively, dynamic linking would use a
different CSU, one which unconditionally calls _pthreads_csu_init as an
ordinary reference.

Is it worth doing?  That's a separate question.  But I have no real
doubt that it could be done if desired.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Trivial program size inflation

2023-07-02 Thread Mouse
> With most real world programs (hopefully) nearly 100% of what you see
> as overhead now is actually needed - and it still may be bigger than
> what we hope for due to suboptimal modularization.

True.  But this is not always as fixable as that wording implies.

For example, a program that calls printf but never uses any
floating-point values at all will not, in theory, need floating point
support.  But we do not have any mechanism by which anything can
discover that no floating-point printf formats are used and thus bring
in a printf variant that doesn't actually support floating point; this
means that a bunch of floating-point stuff will be brought in even
though it will never actually get used.

I'm not sure how fixable this is.  (I'm also not sure whether it's
worth fixing, but for the purposes of this discussion I'm inclined to
say that doesn't matter.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Trivial program size inflation

2023-07-02 Thread Mouse
> Note that a "do nothing" binary is a useless tech demo

I chose it because, to a good approximation, the resulting size is
nothing but overhead.  It is useless as far as its own functionality
goes; it is not useless in that it very clearly measures pure overhead.

I actually started out on Ubuntu, with a program

#include 
int main(void) { printf("Hello\n"); return(0); }

but, when my investigations took me to NetBSD, I started cutting it
back (largely because the 1.4T/sparc version of that weighs in at
39836/440/3000 and I wanted to see how much of that was printf - about
two-thirds, as it turns out).  I first cut it back to  and
write(1,"Hello\n",6); and then finally all the way back to the one I
first posted here about.

> If you really want it, you could avoid libc and csu, and be down to a
> few bytes.

Yes.  Writing it in asssembly I got it down to 12/0/0 on 1.4T/sparc
(clr %o0; mov 1,%g1; ta 0); I don't know amd64 assembly enough to write
the corresponding code there, but I would expect it to be comparable.

> Way more interesting than useless tech demo sizes would size
> inflation of a real world minimal program, when linked statically.

Why?  If I'm looking at overhead size, I am most interested in just the
overhead size, which is exactly what a no-op program gives.  If I want
to look at the overhead of printf, or malloc, or something, I'd use a
program that just calls those.  That might be interesting, but it's not
what I was doing here.

Still, easy enough.  I took 1.4T's /usr/src/games/random.c (picked by
looking at executable file sizes in /usr/games and eliminating the
shellscripts) - but I did remove the #ifdef lint / __COPYRIGHT / #endif
lines, because the 1.4T's random.c's version of that produces an
assembler-time error on 5.2/amd64 and 9.0_STABLE/amd64 (the \ns turn
into literal newlines in the .s, causing the assembler to complain
about unterminated strings).  I'm using the resulting .c for all these
tests, even on other OS versions.  I don't know how close that is to
your "real world iminimal program", but it seems reasonable to me.

1.4T/sparc:

[Sparkle] 252> cc -o main main.c; size main
textdatabss dec hex filename
4565248 280 509313e5main
[Sparkle] 253> cc -static -o main main.c; size main
textdatabss dec hex filename
49084   616 406853768   d208main
[Sparkle] 254> 

5.2/amd64:
[Backstop] 133> cc -o main main.c; size main
   textdata bss dec hex filename
   4069 632 4965197144d main
[Backstop] 134> cc -static -o main main.c; size main
   textdata bss dec hex filename
 1757414760   19064  199565   30b8d main
[Backstop] 135> 

9.0_STABLE/amd64 (ftp.n.o):

% cc -o main main.c; size main
   textdata bss dec hex filename
   4419 666 520560515e5 main
% cc -static -o main main.c; size main
   textdata bss dec hex filename
 590110   29416 2178840 2798366  2ab31e main
% 

> The other things that we *might* look into (if someone volunteers) is
> to better modularize the CSU code, but it is not immediately clear
> how that could/should be done.

I don't know.  I'm going to be looking at it on my 1.4T and 5.2; if
anyone is interested in my results, if-and-when I have any, I can post.
But I don't know how applicable they will be to -current.

> However, I personally disagree with Jason on the static linking
> support

Me too.  In my opinion, there is a very important place for executables
that depend on nothing outside themselves except the kernel.  I've got
two chroots where it is very nice to have executables that _don't_ need
any other files to be present in order to work.

All my own libraries, all the stuff in /local/src/lib*, I build .a
libraries and no .so libraries.  The only things I routinely link in
dynamically are the ones that come with the system, and not always even
those.

> It does not come for free though.

Nothing of value does.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Trivial program size inflation

2023-07-01 Thread Mouse
>> Oh, I didn't mean the program needing to call dlopen() directly.
>> libc itself may load shared objects to support things like i18n and
>> NSS on an as-needed basis.

But that support shouldn't be brought in if the program doesn't use
i18n, NSS, or whatever.

As in, sure, let's say i18n_something.o refers to dlopen.  But if
i18n_something.o itself is not brought in, its reference to dlopen
won't be brought in either.

> And only old guys like me still care about the size of the
> executable...

Yeah, the hip new kids never use anything with fewer than 4
multi-gigahertz cores and RAM load most easily measured in tens of
gigs (cf NetBSD's "industrially relevant" wording).  If that's a
low-end machine, yes, half a meg of overhead _is_ ignorable.

...and, these days, you have to look for specialty channels
(embedded-hardware vendors and the like, or retrocomputing geeks like
me) to find anything much smaller than that.

Even the "tiny" machines these days are ridiculously overpowered.  I
have a machine on my desk, for example, which runs off one USB cable,
meaning it can't be drawing more than, IIRC, five watts.  The box is
maybe 2"x2.5"x4".

It's a quad-core ARMv7 with a bit under a gig of RAM - I suspect 1G but
with various things stealing pieces of it.  (I haven't found a way to
get it to cough up the CPU clock rate; it's stuck running a Linux
variant, and I am not that familiar with Linux.  It's a Pi of some
stripe, I believe.)  And work, which is why I have it at all, is
talking about it being underpowered enough to replace it with something
even more ridiculous.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Trivial program size inflation

2023-07-01 Thread Mouse
>>>> amd64, 9.0_STABLE (ftp.n.o):

>>>>textdata bss dec hex filename
>>>>  562318   29064 2176416 2767798  2a3bb6 main

> amd64, 9.0_STABLE:
>data bss dec hex filename
>2873 186  723131 c3b a.out

Is that linked dynamic, or did you stub out all the miscellanous junk?
Is that what the code you quote (below, which I'm cutting) is?

> crt0 pulls in
> - atexit
> - environment
> - static TLS
> - stack guard

Why TLS??  Is there some networking going on under the hood?

Still, I would have expected all that to be done via weak symbols, so
if they aren't needed they aren't brought in.  For example, there's no
need for the atexit support if the program never uses it, so with
suitable use of weak symbols and the like it should be possible for
crt0 to call on it only if it's used.

Or, at least, so I would expect.  Am I missing something?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Trivial program size inflation

2023-07-01 Thread Mouse
>> [...size of a do-nothing program...]

> [...]
> Not to forget the code for C++ support.  And, of course even static
> binaries may call dlopen() and friends.  So that dl*() and the ELF
> bits right there.

dlopen, that doesn't make sense to me.  For a statically linked
program, the linker can tell whether it calls dlopen et al.

Or, to put it another way, even static binaries may call, say, nfssvc
or _hes_error, so why aren't they brought in too?

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Trivial program size inflation

2023-07-01 Thread Mouse
>> and built it -static.  size on the resulting binary:

>> sparc, my mutant 1.4T:

>> textdatabss dec hex filename
>> 12616   124 288 13028   32e4main

>> amd64, my mutant 5.2:

>>text data bss dec hex filename
>>  152613 4416   16792  173821   2a6fd main

>> amd64, 9.0_STABLE (ftp.n.o):

>>textdata bss dec hex filename
>>  562318   29064 2176416 2767798  2a3bb6 main

> What are the compiler (especially the linker) and compiler version on
> your 1.4T vs. 9.0?

1.4T (sparc):

[Sparkle] 188> /usr/bin/gcc --version
egcs-1.1.2
[Sparkle] 189> /usr/bin/ld --version 
GNU ld 2.9.1
Copyright 1997 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License.  This program has absolutely no warranty.
  Supported emulations:
   sparcnbsd
   elf32_sparc
   sun4
[Sparkle] 190> 

5.2 (amd64):

[Backstop] 83> /usr/bin/gcc --version
gcc (GCC) 4.1.3 20080704 prerelease (NetBSD nb3 2007)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Backstop] 84> /usr/bin/ld --version
GNU ld version 2.16.1
Copyright 2005 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License.  This program has absolutely no warranty.
[Backstop] 85> 

9.0_STABLE (amd64, assuming nobody changed anything between my previous
runs and now):

% /usr/bin/gcc --version
gcc (nb4 20200810) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% /usr/bin/ld --version
GNU ld (NetBSD Binutils nb1) 2.31.1
Copyright (C) 2018 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
% 

The 1.4T and 5.2 compilers are slightly mutant, but nothing I would
expect to affect this - I can describe my tweaks if you like.

> Having the obligation to support a myriad of systems for kerTeX, I
> have seen that, unfortunately, static linking is considered nowadays
> a second rate feature if not a deprecated one,

Curious, then, that it seems to be the preferred/only way for the
"cool" new languages, like Rust and, I'm told, Go

> Other thing to look at: the so called i18n.  On NetBSD, iconv doesn't
> work with static linking, but could it be that its object are
> nonetheless added too?  As well as threading and so on?

Perhaps so, but why would iconv or threading be brought in for a
program that does nothing?  I'm sure _something_ is clearly being
brought in; I suppose my report is actually along the lines of
"something needs fixing; half a meg of libc (libgcc, whatever) to
support a program that does nothing seems excessive".

I did have a previous version which called printf, which I would expect
would pull in a bunch of stuff.  That version was about three times the
size on 1.4T; I don't recall trying it on 5.2 or 9.0_STABLE.  I then
cut it down to write(), then finally all the way to nothing at all.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: old style tail(1) options and bin/57483

2023-06-30 Thread Mouse
>> Do we want to support postfix options in something like old style
>> +qF ?

Personally, I curse every time I run into a tail that doesn't support
"tail +0f" or "tail -f".  I think that and "tail -%d" are the only
forms I use enough for it to be any kind of issue for me for them to
change.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Trivial program size inflation

2023-06-30 Thread Mouse
Based on something at work, I was looking at executable sizes.  I
eventually tried a program stripped about as far down as I could:

int main(void);
int main(void)
{
 return(0);
}

and built it -static.  size on the resulting binary:

sparc, my mutant 1.4T:

textdatabss dec hex filename
12616   124 288 13028   32e4main

amd64, my mutant 5.2:

   textdata bss dec hex filename
 1526134416   16792  173821   2a6fd main

amd64, 9.0_STABLE (ftp.n.o):

   textdata bss dec hex filename
 562318   29064 2176416 2767798  2a3bb6 main

12K to do nothing is bad enough (I'm going to be looking at why it's
that big).  149K is even more disturbing (I'll be looking at that too).
But over half a meg of text and two megs of BSS?  To do nothing?
Surely something is wrong somewhere.

Not that NetBSD is alone in this.  On an Ubuntu machine at work, I see

   textdata bss dec hex filename
 761750   208046016  788570   c085a main

but I hardly think Ubuntu's sins are relevant to NetBSD. :-)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: printf(1), sh(1), POSIX.2 and octal escape sequences

2023-06-28 Thread Mouse
>>> "\ddd", where ddd is a one, two, or three-digit octal number, shall
>>> be written as a byte with the numeric value specified by the octal
>>> number."
>> [...]
> I beg to differ: since due to this very unfortunate "variable length"
> feature, your scanner has to read char by char, it can reject the
> third digit since it would yield an out of range byte value.

Would it?  Only if your bytes are smaller than nine bits - or if
they're signed and smaller than ten bits.

Is the size of a `byte' specified anywhere?  If not, then the \777 ->
\77 7 interpretation would parse \777 as one byte with >=10-bit bytes,
two with (6-to-)8-bit bytes (9-bit bytes, it depends on whether bytes
are signed).  That strikes me as a relatively serious potential
problem.

Certainly I have heard of C implementations where `char' is the same
size as short, int, and long (typically word-addressed DSPs).  But I
don't know whether that bears any relation to a `byte' in this sense.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: inetd(8): cmdif as builtin

2023-06-12 Thread Mouse
>> In any case, the major issue I would have with it is the lack of
>> authentication.  But that's so obvious that I assume you would be
>> doing something like requiring a password - or doing it only for
>> AF_LOCAL sockets and using LOCAL_PEEREID.  [...]
> When this was done in last years GSoC it was a AF_LOCAL socket to
> control inetd.  I am not sure that inetd having a configuration
> service listening on the network is a really good idea -

I'm not sure either.  But I definitely am not sure enough that it never
could be that I would specifically forbid a configuration that, for
example, tries to activate cmdif on 10.4.77.186 or some such.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: inetd(8): cmdif as builtin

2023-06-09 Thread Mouse
> Le Fri, Jun 09, 2023 at 08:47:10AM -0400, Mouse a écrit :

I find it amusing that it's "Fri, Jun 09, 2023 at 08:47:10AM -0400"
rather than something like "ven, 09 jui 2023, a 08:47:10 -0400", when
the surrounding text _is_ en français.

>>> BTW; just an idea: in the case of inetd(8), wouldn't it be more
>>> simple and logical, in this very case, to add a "cmdif" (cmd
>>> interface) builtin?
>> Simpler and more logical than what?
> Emphasis: in the inetd(8) context.  [...]

Sure.  But what is it you're comparing a cmdif builtin to, what is it
that it's simpler and more logical than?  Signals? My (hypothetical as
far as inetd is concerned) pidconn?  Something else?

Not that I disagree.  If you want to build a command line into inetd,
you'll need _some_ way to access it; short of an auxiliary tool that
plays games with ptrace() or its ilk, or pidconn, something like what
you suggest would be not only sensible but pretty close to necessary.
It doesn't _have_ to be an entry in the config file, but that is the
sanest way to design it I can think of offhand.

>> The biggest difference I see between this and using signals to
>> provoke these actions is the target namespace: filesystem names for
>> AF_LOCAL or process IDs for signals.
> More than this: you can pass parameters with signals.

I assume there's a negation missing there.  Yes, this is part of the
reason I built pidconn: signals use PIDs for their destination
namespace, but are an extremely restricted communication channel,
suitable for little more than a very few seldom-used commands.  I've
got at least two programs that already use both SIGUSR1 and SIGUSR2
(designed before I had pidconn).

> I'm still thinking (background process) about the subject you have
> started in another thread (about a way to pass commands to a process
> in a more broad way than what is allowed by signals).

I'd be interested in hearing any thoughts you may come up with.  I'm
hardly wedded to pidconn; it's just the best alternative I came up with
that looked sanely implementable.

In the particular case of inetd, I agree: its raison d'être is, as you
say, to listen for network connections and do things in response to
them, so, if a network connection is suitable for the purpose, an
internal service is a good model.

> All in all, why daemon(3) or a variation of daemon(3) would not
> change stdin and stdout to not be linked to the controlling terminal
> but precisely to another interface that allows sending commands and
> receiving results, and deferring error or logs via stderr?

That would certainly be possible.  But:

(1) I would not want this to be restricted to daemonized processes.

I have at least two programs which export pidconn interfaces which have
proven useful even when the processes are not daemonized (and neither
of them is anything for which daemon() would even be appropriate,
though one of has an option to auto-background itself by forking and
the parent exiting).

In the particular case of inetd, I would want the management interface
to be available even when running non-daemonized, such as for
debugging.

(2) You'd still have to design the rendezvous mechanism, the thing you
describe as just "another interface".

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: inetd(8): cmdif as builtin

2023-06-09 Thread Mouse
> BTW; just an idea: in the case of inetd(8), wouldn't it be more
> simple and logical, in this very case, to add a "cmdif" (cmd
> interface) builtin?

Simpler and more logical than what?

In any case, the major issue I would have with it is the lack of
authentication.  But that's so obvious that I assume you would be doing
something like requiring a password - or doing it only for AF_LOCAL
sockets and using LOCAL_PEEREID.  (This is pretty close to what most of
my pidconn servers do - they use the pidconn analog of LOCAL_PEEREID to
verify that the client is either root or the same UID the server is
running as.)

The biggest difference I see between this and using signals to provoke
these actions is the target namespace: filesystem names for AF_LOCAL or
process IDs for signals.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: style, sysexits(3), and man RETURN VALUES for sys programs

2023-06-03 Thread Mouse
>> Rhialto pointed me to sysexits(3) that was exactly what I was
>> looking for (for inetd(8) revision). So kudos to him!
> I deliberately didn't mention sysexits.h (or sysexits(3)) as I don't
> think it is really appropriate here.

I agree and I disagree.

I agree that it certainly isn't the same sort of compelling use case
that sendmail was back in the day.

But, also, just beacuse you or I can't think of a use case for it
immediately doesn't mean one won't appear.  And, as far as I can see,
it does no harm.

The rc script might even be able to use it; that it doesn't now could
mean nothing more than that there has historically been no such
information for it to use, so why try?  Certainly I can see a boot-time
script caring about, for example, the difference between EX_CONFIG
(fall back to a minimal configuration) versus EX_SOFTWARE (give up),
possibly even retrying a few times in case of EX_TEMPFAIL.

> Without that, all the calling program sees is "program exited with
> status 78" so something went wrong, but what?  Having to go back to
> the manual (or system include file, which would be worse) to find out
> what that error number means, is horrid - that's the way systems used
> to work in the dark ages, when actually putting error messages, as
> strings, in programs was too costly - and believe me, you don't want
> to be in that environment.

exit(EX_CONFIG) is no less informative than exit(1), for sure.
Certainly, producing a message is good, but, if it's easy to do (which
it appears to be, here), producing a message and exiting with a
sysexits exit code is better.  As a putative script writer, I'd far
rather use a sysexits exit code than have to try to scrape stderr.

> If there's also a message that explains the error, then you don't
> need much in the way of specifics in the exit status, as all anyone
> will look at is the error message [...]

All any _human_ will, perhaps.  Maybe _you_ are confdient nobody will
ever want a script to take differential action based on the failure
class; _I_ am not.

> Unless there is a really good reason - which would amount to there
> actually being something which will interpret the exit status from
> inetd for more that ok/not ok then I wouldn't be going near sysexits.

I would say that, if it's easy to produce a useful exit code, why not?
It's no less informative than single-bit exit code; if a script just
cares about success/failure, then sysexits exit codes work exactly as
well as exit(1), and they do make a little more information available
in case anyone wants to build a script that uses it.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread Mouse
>>>> So I added a new file descriptor type (well, semi-new; they're
>>>> DTYPE_MISC) and a new syscall (pidconn) [...]
>>> In this area (as others, in fact), the Plan9 solution is
>>> [/proc/$PID/ctl]
>> That is unidirectional communication.  pidconn is bidirectional.
>> Aside from that, how does the process (a) notice that this has been
>> done and (b) get the string ($cmd in the above) to act on it?
> Controling the process can be made by writing to the "ctl" file (this
> does what some Unix signals do).

Okay, so it's not really comparable to pidconn.

> And if bi-directional IPC is wanted, it can be done at user-level
> with the plumber that allows to "receive, examine, rewrite and
> dispatch messages between programs" as long as the program allows to
> be plumbed (prepared to read and write from conventionally named file
> in the namespace).  (It does multiplexing too; it is not only 1<->1.)

> In fact, one can do the same thing probably with Unix.

One can do similar things.  I could have, for example, decided that I
would use AF_LOCAL sockets in /IPC/ with names matching participating
process's PIDs.  Or I could have used AF_INET6 sockets in host-local
space (I think there's some host-local space defined).  Or probably
various other alternatives.

However, none of those was a good-enough match to the semantics I
wanted for me to be content with it.  (Most particularly, none of them
have anything stopping process X from using the endpoint appropriate to
process Y and none of them seamlessly handle forking processes.)

It's possible that Plan 9 has something else that would work better.
But switching to Plan 9 just to get this was a _much_ higher cost than
just implementing pidconn() - if it had occurred to me to think Plan 9
might have had something suitable, which it didn't.

> But the concept of having well-known file names to communicate with a
> process is appealing and to not have to set the communication
> channels in every program but just to prepare to use them if the
> program wants too, is a bonus.

Except for the "file names" part of that, I agree.  pidconn does
require that programs interested in accepting connections do need to do
a little preparation; I could have implicitly created a listening
endpoint for every process, but I have no good answer to the question
of what happens if you try to connect to a process that isn't
interested in such connections.  Either that question needs answering
or participating processes have to declare their interest in such
connections _somehow_.  I chose the latter.

I might have been able to do it using mount_portal.  I haven't
investigated that; perhaps I should have.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread Mouse
> The inclusion directive is a dot i.e. a here script:

Well, as I think someone else pointed out, I wokuldn't call that a here
script.  sh spells it ".", csh spells it "source", but here script is
more often used for "<<" style input, input that's inline, which an
included file by definition is not.

> This means that all definitions have a global scope and that the
> feature of the default address, if not specified, is not limited to
> the file that is dot'ed there.

Then there is definitely a use for including the same file more tha
once:

10.7.44.184:
. private-services
172.18.9.1:
. private-services
*:
. everyone-services

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread Mouse
>> So I think when using inetd in check mode, it is an utility and the
>> config will go to stdout, and the errors about the parsing to stderr
>> [...]

> IMHO, if check + -f foreground; errors to stdout/stderr,
> and if check + (background default); output to syslog.

I see no particular use for dump-to-syslog, so I would do that only
with a specific particular option to do so.  Perhaps I'm
misinterpreting what check mode would print.

> If check mode is used in an rc.d script (for example), dumping a lot
> of errors to stderr can ruin "clean" rc.d output,

If your inetd config has syntax errors, you've got bigger problems than
the asesthetics of your rc output.  I see no use for check mode on
routine boot-time startup; if check mode would fail, so would normal
startup, so the check run brings no additional value.

> and if the stderr errors occur at startup, you may not have easy
> access to errors printed to stderr on boot, whereas you generally
> have the syslog.

I would leave that sort of diagnosis to a manually-run check after
startup.  A report of the presence of a fatal syntax error to syslog,
that's reasonable.  Perhaps even with the error.  But full check
diagnostic output seems inappropriate for syslog, to me.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: [RFC] inetd(8) changes proposal

2023-05-31 Thread Mouse
>> So I added a new file descriptor type (well, semi-new; they're
>> DTYPE_MISC) and a new syscall (pidconn) [...]

> In this area (as others, in fact), the Plan9 solution is quite
> elegant (and consistent with the whole design): in the /proc/
> directory, the process has its directory /proc/$pid/, under which
> there is "ctl" file to which one can write textual messages to
> control the process:

> echo $cmd >/proc/$pid/ctl

That is unidirectional communication.  pidconn is bidirectional.  Aside
from that, how does the process (a) notice that this has been done and
(b) get the string ($cmd in the above) to act on it?

The procfs I have has a ctl file, but it is ptrace-style control, not
communication a la pidconn (which is optional, more like sockets).  It
also appears to have vanished by 9.1.  Besides those, I'm not sure how
I feel about depending on procfs.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: [RFC] inetd(8) changes proposal

2023-05-30 Thread Mouse
> But that's independent of the signal used to make it happen, [...]

> Personally I'd probably pick a different signal for that, but I am
> not sure which.

I have long felt that having only two uncommitted signals (SIGUSR1 and
SIGUSR2) is way too few.

I've also long wanted a way to contact processes by PID.  Something
like sockets where addresses are process IDs.

A while ago, I finally sat down to do something about that.  I
eventually came to the conclusion that sockets would not do, not
because of any conceptual problem but simply because the internal
infrastructure involved was incompatible with the design goals I had.
(It's possible the socket API design _is_ incompatible with my design
goals, but I found the internal issues before I became convinced of
that.)  So I added a new file descriptor type (well, semi-new; they're
DTYPE_MISC) and a new syscall (pidconn) to do the analogs of
socket+listen and socket+connect and a few other relevant things.

The relevance to this thread is that the way I'd handle this is to give
inetd a(n optional) pidconn listener via which it could be told to dump
its config, either to the pidconn connection or to a file whose name is
specified in the command.

Assuming NetBSD isn't interested in going that far, perhaps it would be
reasonable to have a config-file syntax which specifies a listening
point (AF_LOCAL, AF_INET, AF_INET6, whatever) via which it could be
similarly commanded?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: inetd(8): continue or exit on error?

2023-05-29 Thread Mouse
>> I'm not sure inetd(8) has any business calling realpath in the first
>> place.

I agree.

> It has to call realpath(3) since in order to not include several
> times the same file, it makes strings comparaisons about names.

If I as an admin write a config that tries to include the same file
twice, whether via the same or different paths, I expect it to include
the same file twice.  If that leads to errors, that's on me.  (It might
or might not _always_ lead to an error, depending on how the include
file syntax is defined to interact with the stateful aspects of the
config language and what's in the config files.)

If you really want to avoid including the same file twice even if
that's what the config says to do, I'd say it should do so with dev/ino
comparisons, not pathname comparisons.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Sanitizing (canonicalising) the block device name in mount_ffs ??

2023-05-27 Thread Mouse
> Today I am wondering why we need to [...] canonicalise the block
> device name for mount.

> Currently I am seeing 4 reasonable choices...

> 1) just omit the pathadj() of the block device name, and just use
> whatever the user says to use, unchanged.  I doubt anything would
> really be affected by this, but it does make a difference if some
> user were to use
>   /./dev/../dev/wd0e
> orwd0e
> where the latter is either a symlink to /dev/wd0e, or a copy of /dev/wd0e
> or $PWD==/dev and it is simply a relative path.

If using the single-argument form of mount, as tlaronde@ pointed out,
it can affect the lookup in fstab.

I'm not sure whether I think that's good, bad, both, neither

It could also matter in that it would lead to a possibly-confusing
mount-from string in the mount table.  I'm not sure whether that would
matter, either, but it's something that needs to be thought about.

Also consider what happens if, say, a mount of /dev/foo is done in a
chroot.  The mount-on directory path is mapped between chroot and host,
but I think the mount-from path is not, and /dev/foo in the chroot may
not be the same as /dev/foo in the host.  I'm not sure whether I think
that needs fixing either.

> 2) we could prohibit relative paths, or paths containing '.' or '..'
> components - simply check the path and refuse the mount.

We could.  I don't see why we'd want to, though, especially since in
the presence of symlinks .. is not equivalent to stripping off the
preceding pathname component (that is, /foo/bar/.. may not be /foo).

> 3) we could apply pathadj() (as it currently is) to the paths which
> choice 2 would prohibit (which won't affect the rump using ATF tests,
> which don't do that).

And do what if pathadj fails?

> 4) we could change the pathadj() of the block device name to instead
> simply call realpath(3), use the result if that succeeds (which is
> what happens now, and in the past), but simply use the user's arg if
> it fails [...].

I'm not entirely sure, but at the moment I think I would probably go
for (1), pending finding something it breaks.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Sanitizing (canonicalising) the block device name in mount_ffs ??

2023-05-27 Thread Mouse
>> Today I am wondering why we need to [...] canonicalise the block
>> device name for mount.
> Isn't it because in order to be able to compare strings, the path has
> to have an uniq (canonical) form, independent from the way the user
> enters it?  For example, at the user level, how mount(8) could
> compare, given only one argument, to what is in /etc/fstab without
> trying first to give the pathname given some normal form?

That's a decent point; perhaps mount should do this when, but only
when, it's given only one argument.

But maybe not.  Why would that be a good thing for it to do?  Do you
_want_ mount to turn "mount symlink-to-wd0e" into "mount /dev/wd0e"?
I'm not convinced that is a good thing.  (I'm also not convinced it's a
bad thing, mind you.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: debugging/tracing a setuid program

2023-05-08 Thread Mouse
> [...] openat() [...].  It's detected by autoconf on the -6 chroot on
> -8 while -6 doesn't implement it.

As I wrote back on 2009-11-20 [%],

such configuration scripts are [] very hard to sandbox (at best
you end up configuring the software to run in the sandbox),

which is exactly what happened here.  The configure script worked as it
was designed to: it configured the software for the environment it was
running in.

It's one of the reasons I dislike the whole ./configure paradigm.

In the long term, the best solution I have is to stop using the damn
things (which unfortunately is difficult unless you're big enough for
you alone to make large-program authors sit up and take note or
quixotic enough to roll your own, either programs or build
infrastructure); in the short term, all I can say is, self-host,
self-host, self-host. :-(

Yet another of the problems I ascribe to the GNU people.

[%] http://ftp.rodents-montreal.org/mouse/blah/2009-11-20-1.html in
case anyone wants to read the whole piece.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: debugging/tracing a setuid program

2023-05-05 Thread Mouse
>> (a) I'd say it shouldn't stop ktracing
> I suspect it stops as soon as sudo calls setuid.

(a) If it does that when the trace was set by root, I call that a bug.

(b) Even if so, it shouldn't stop partway through an operation.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: debugging/tracing a setuid program

2023-05-05 Thread Mouse
>> As root, ktrace -i the shell (or other process) it's started by.
> That gives me a ktrace that stops in the middle of the GIO where sudo
> is reading the sudoers file.

That...well, I have trouble seeing that as anything less than a bug.
(a) I'd say it shouldn't stop ktracing and (b) I *definitely* would say
it shouldn't stop partway through an operation.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: debugging/tracing a setuid program

2023-05-05 Thread Mouse
> I have an interesting problem: How do you debug or ktrace a setuid
> binary that exhibits the problem only when run as non-root?

As root, ktrace -i the shell (or other process) it's started by.

If you can change its code, have it ktrace itself on startup.  (And if
that changes the behaviour, good luck - you're probably dealing with a
heisenbug, dependent on stack trash or some such.)

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: flock(2): locking against itself?

2023-04-18 Thread Mouse
>> While writing this mail, it occurs to me that [...]

> Try libtickit--that uses callbacks (I think):

> https://www.leonerd.org.uk/code/libtickit/

It looks notably sane.  (So far, the only things I see that annoy me
are (1) there's no git:// repo to clone, (2) the Makefile blindly
assumes GNU make.  But I've got a lot of investigating yet to do; I
fullky expect there will be lots of good and bad things (from my POV)
to discover.)

Thank you!

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: flock(2): locking against itself?

2023-04-18 Thread Mouse
>> Here, we have software that's crippled by lacking support for
>> producing its output in a generalized fashion - so, rather than
>> fixing the limitation, we depend on a heavyweight OS feature to work
>> around it?  [...]
> Or we have a library that is specifically designed and implemented to
> talk to a tty and optimaise the output so that it outputs the minimum
> amount of data to that tty and now someone complains that it doesn't
> work on something that is not a tty?

Fair point.

I'm actually complaining that it doesn't work connected to something
that _is_ a tty...when the tty is nonblocking.  Wanting the refresh
guts to be separate is just one way to address that.

Initially, it seemed to me that, given the usual intent of
nonblocking, something like that was necessary to fix it without
calling in heavyweight OS support like ptys and additional processes -
or, for versions sufficiently recent that nonblocking is an open file
table entry thing instead of a backing-object thing, depending on the
output tty being reopenable.

While writing this mail, it occurs to me that a mode wherein everything
operates same as now, except that output, instead of being pushed to
the tty with fwrite() or write() or writev() or whatever, is just
passed to a callback.  That would address my use cases with a much less
intrusive change.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: flock(2): locking against itself?

2023-04-18 Thread Mouse
>> (a) Does anyone have any other ideas, not mentioned above, for how
>> to address these issues?
> Not really, I would suggest using a pty even though you have
> dismissed it.

Well, FSVO "dismissed".  It seems ridiculously heavyweight for the
task; it also feels wrong.  Here, we have software that's crippled by
lacking support for producing its output in a generalized fashion - so,
rather than fixing the limitation, we depend on a heavyweight OS
feature to work around it?  That's like, oh, say, we have a language
that lacks support for decimal numbers, so, instead of fixing that, we
add a front end which converts all numbers to octal or hex.

Most briefly, I would prefer to fix the problem instead of just tacking
on a workaround.

For testing libcurses, the pty method is probably necessary, since you
want to test the tty interface code as well.  But for testing just the
screen updater (which I suspect is where most of the hair lies), why
should you have to go through the kernel at all?  Make it a
well-isolated unit and you can then unit-test it.

> The libcurses atf test does this to test curses functions, capturing
> the output and comparing it to an expected output.  It uses poll(2)
> to check for i/o so as to prevent blocking on reads and writes.

But it needs an additional process, or at least thread, or it's
vulnerable to blocking when dealing with a large update.  Even
heavierweight.

>> (b) Does anyone know of any work done towards pulling the screen
>> updater out of curses, so it can be used without all the baggage
>> tied to the OS that libcurses imposes?
> No.  I am not really sure what general use it would be.  Sure, you
> have a very specific need, that I can understand but struggle with a
> generalised concept.

I have two use cases already, and I have two more which are vulnerable
to the same issue but which I haven't actually run into it with...yet.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: style: clean up mentions of old-style C

2023-04-17 Thread Mouse
>>> [...] a main() that expects the env argument.
>> I don't think I've *ever* seen that used.
> Neither have I.

I think I have, but not often and long enough ago I'm not sure of
details.

> I see no problem in disallowing the env arg to main, it's not in any
> (modern?) C standard anyway.

It's sort-of in C99, though I don't know whether you count that as
"modern".  5.1.2.2.1 #1 says that "The function called at program
startup is named main.  The implementation declares no prototype for
this function.  It shall be defined with a return type of int and with
no parameters [...example...] or with two parameters [...example...] or
equivalent; or in some other implementation-defined manner.".  The part
after the last semicolon looks to me as though it allows _any_
declaration for main, from the traditional argc,argv,envp to
int main(struct stat *, pid_t, pid_t, void *, const double * const *),
though clause #2 may require it to have parameters that operate like
argc and argv.  I'm not sure about a return type other than int.

Given C's tendency to codify existing practice, I suspect you'll have a
hard time finding a C standard that forbids argc,argv,envp.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: flock(2): locking against itself?

2023-04-16 Thread Mouse
erve perfectly well.  I imagine there
> are some extant implementations already, and if not, writing one will
> probably be less work than grinding the curses we have.

Hm.

I'll have to think about that.  At the very least, that's another
assumption I'd rather avoid making, the assumption of high bandwidth
between the program and the output screen.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: flock(2): locking against itself?

2023-04-16 Thread Mouse
>>> [...].  I have a program that uses curses for output but wants to
>>> do non-blocking input.  So I set stdin nonblocking and threw fd 0
>>> into the poll() loop.

>>> But, in normal use, stdin and stdout come from the same open on the
>>> session's tty, so setting stdin nonblocking also sets stdout
>>> nonblocking, which curses is not prepared to handle, [...]

> I think O_NONBLOCK on an input FD would be a problem too for curses.

It doesn't seem to be in my experience - but I read my input directly
from fd 0 (I turn off echo) rather than through curses.

> Otherwise, if only a non-blocking output were the problem, then you
> could just re-open the current tty twice and juggle the FDs like
> this:

I doubt this will work, since non-blocking is state on the tty rather
than on the open file table entry or the descriptor.  And you say it
doesn't actually work for, at least, vi.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: flock(2): locking against itself?

2023-04-15 Thread Mouse
Back on 2023-03-30, I posted to tech-kern on a thread that had drifted
off-topic for tech-kern, into tech-userlevel territory.  I was writing
about bad interactions between libcurses and nonblocking I/O.

In particular, I wrote

> [...].  I have a program that uses curses for output but wants to do
> non-blocking input.  So I set stdin nonblocking and threw fd 0 into
> the poll() loop.

> But, in normal use, stdin and stdout come from the same open on the
> session's tty, so setting stdin nonblocking also sets stdout
> nonblocking, which curses is not prepared to handle, leading to large
> screen updates getting partially lost.

Today, I tried to fix this.

So far, I've tried setting up an output copier process, creating a pipe
which the main process uses for output, leaving that pipe blocking and
creating a copier process which copies from the pipe to the original
stdout in a way that is prepared to handle nonblocking output.

It doesn't work.  libcurses insists on getting the tty size from the
output side, and of course the pipe has no size setting.  (It might
well work if the tty size equaled the default size from the termcap
database for the terminal type in use, but that shouldn't be necessary,
and in my tests it isn't so.)

I considered using an input copier, instead, but (a) that won't fix it
all, because libcurses also pokes at its input (the code is lousy with
tcsetattr and tcgetattr on ->infd), and (b) it is significantly harder
to make an input copier process die when the main process does than it
is for an output copier.

I mentioned, in the tech-kern thread, the possibility of using
libcurses's newterm() with a funopen()ed stream for output.  That won't
work either, for basically the same reason an output copier process
doesn't, though with different details: curses uses fileno() on its
output stream and tries to get the size from the result.  With a pipe,
fileno() works but it doesn't have a window size; with funopen(),
fileno() will return -1, which of course isn't a valid descriptor and
thus doesn't support getting its window size.

It seems to me that there are multiple issues with libcurses here.

(1) It does not get along with input and/or output being set
non-blocking.

(2) newterm() takes FILE *s rather than file descriptors, but requires
for correct operation that they be connected to file descriptors anyway
(it uses fileno() all over the place).

(3) Even if the FILE *s _are_ connected to file descriptors, it does
not work correctly unless those descriptors are connected directly to
tty devices.

(4) There's no way, without going under stdio's hood, to create a stdio
stream that operates like a funopen()ed stream but allows the caller to
arrange for fileno() to return something useful.

The only solution I've come up with that stands a real chance of
working right without getting into the guts of libcurses is to allocate
a new pty and use that for output, watching for size changes on the
"real" tty and pushing them to the pty.  That seems outrageously
heavyweight for the purpose.

At this point, I'm tempted to pull the screen-updating smarts out of
libcurses (that's the only part I want to use, here) and make it
available under an API that has absolutely nothing OS-dependent about
it, possibly excepting terminal type capabilitiy lookup.  So, I'm
writing to ask:

(a) Does anyone have any other ideas, not mentioned above, for how to
address these issues?

(b) Does anyone know of any work done towards pulling the screen
updater out of curses, so it can be used without all the baggage tied
to the OS that libcurses imposes?

(c) If not, would there be any interest in such a thing?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Rationale for some rules in style guide

2023-04-11 Thread Mouse
> Which compiler from this century doesn't allocate stack space
> independent from the source order?

At a minimum, the one that shipped with 5.2.  According to --version,
it is

gcc (GCC) 4.1.3 20080704 prerelease (NetBSD nb3 2007)

which is well within this century, and, on amd64, this program

#include 

int main(void);
int main(void)
{
 volatile char beg;
 volatile char a;
 volatile int b;
 volatile char c;
 volatile char end;

 a = 1;
 b = 2;
 c = 3;
 printf("%d\n",(int)());
 return(0);
}

prints 9, but if I move the "volatile int b" line up or down one line,
the output changes to 8 or 10.  (Yes, the I got the beg and end names
backwards.)  The volatiles and assignments are to keep the compiler
from optimizing unused variables out of existence - the first version,
which had neither, printed 1.  Compiling with -save-temps and looking
at the assembly, it's clear that the variable order on the stack is the
source-code order.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Rationale for some rules in style guide

2023-04-11 Thread Mouse
> The style guide says:
>> Avoid initializing variables in the declarations
> Why?

Well, I didn't write the style guide, so I'm not authoritative on
questions of why something is there.

And, it's a _guide_, not _mandates_.  Every style rule I've ever seen,
except really vague ones like "write readable code", I think is better
broken at least occasionally.  I've worked on projects where style
guides have been elevated to the status of mandates; I've always
thought it a mistake.

Those said

> In functions that consist of a few small paragraphs, I like
> paragraphs of this form, which are compact yet readable:

>   command cmd = xmalloc(sizeof(cmd));
>   cmd->name = "name";
>   cmd->args = build_arguments();
>   cmd->env = build_environment();

There is a bug in that code.  Given how you are using cmd, command must
be a pointer-to-struct type (or pointer-to-union, but then the code,
while perhaps valid C, is semantic nonsense), but you a allocating
enough space for the pointer, not enough space for the struct.

Readability is subjective.  There is, strictly, no such thing as
"readable code" or "unreadable code", code is readable or unreadable
only with respect to particular (putative) readers.  (While admittedly
there is often at least rough consensus about (un)readbility in extreme
cases, this is not the IOCCC.)

Initializing variables in their declarations tends to textually mix
executable code and variable declarations.  I find that impairs
readability.

As for your snippet, I find its readability impaired in multiple ways:

(1) The type name "command" is not visually distinctive.  I would
uppercase that.  (This admittedly is not relevant to the particular
point you started out talking about.)

(2) command is apparently a pointer type, but there is no * in the
declaration.  (This too is independent of your initial point.)

(3) There is no visual separation between variable declarations and
code.  I would add a blank line between lines 1 and 2 of your example.

(4) Given the initialization of cmd in its declaration, there cannot be
visual separation between its declaration and the executable code of
its initializer...not unless you insert a blank line into that
declaration, which (a) would be even more visually confusing and (b)
can work for at most one variable per block.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mount(8) enhancement

2023-04-01 Thread Mouse
>>  mount /dev/dk0:bart /mnt
> I would much prefer to use the existing mechanisms to specify
> filesystem-specific parameters more generally.  [...]
>   mount -t foo -o -x=bart /mnt

(a) Oops, that should be
mount -t foo -o -x=bart /dev/dk0 /mnt

(b) This is what I did back in the 1.4T days when I added an option to
mount_ffs that allowed making the root point of the mount something
further down in the on-disk filesystem than its root.  I added an
option to mount_ffs, -p PATH, to specify the path within the filesystem
to find the root of the mount point.  So, in a tiny way, there is prior
art even aside from mount_mfs.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mount(8) enhancement

2023-04-01 Thread Mouse
>   mount /dev/dk0:bart /mnt
>   mount /dev/dk0%lisa /mnt
>   mount /dev/dk0@maggy /mnt

> Our current mount(8) and all other file systems uses pathadj() from
> sbin/mount/pathadj.c that is really a wrapper around realpath(3).

> My proposal is to adjust
>   pathadj(const char *input, char *adjusted)
> into
>   pathadj(const char *input, char *adjusted, char *tail)

> It first copies the tail from the input with the tail defined as
> anything after one of [$%@:&].

Well...I have no skin in this game; if this goes in, it will be in a
NetBSD version I don't run.  But, that said, I dislike it.  It
conflicts with traditional NFS syntax, as you note, and, more
generally, what you are doing here is taking a parameter to the
filesystem ("bart", "lisa", "maggy") and shoehorning it into the device
pathname.

I would much prefer to use the existing mechanisms to specify
filesystem-specific parameters more generally.  There is prior art here
in the -s option to mount_mfs, leading to mount commands such as (to
quote one manpage) "mount -t mfs -o nosuid,-N,-s=32m swap /tmp".  I
don't see any reason why what you give above couldn't be done as

mount -t foo -o -x=bart /mnt
mount -t foo -o -x=lisa /mnt
mount -t foo -o -x=maggy /mnt

or whatever option mount_foo uses instead of -x.

On a separate note, as long ago as 5.2, mount(8) says

 In addition, if special contains a colon (`:') or at sign (`@'),
 then the nfs type is inferred, but this behaviour is deprecated,
 and will be removed in a future version of mount.

You're now coming up on NetBSD 10.  If you decide you really do want to
overload the device path for this, I'd suggest getting rid of the : and
@ NFS inference; it's been over a decade and coming up on five
releases, which strikes me as long enough.  But, as I say, I'd rather
do it via mount_whatever-specific options.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: style(5) proposal: forbid extern in .c

2023-03-18 Thread Mouse
>>> extern struct netif_stats   le_stats[];
>>> ...
>>> struct netif_stats le_stats[NLE_IFS];

>> [riastradh@ and I say to s/extern/static/ and add static to the
>> later declaration, if le_stats can be file-local]

> Thanks, using 'static' as a forward declaration just works.

I may be misreading here, but this sounds as though you're converting
the `extern' to `static' but _not_ adding `static' to the later
declaration.

If so, well, this is not valid C, at least not as of C99.  If it "just
works", that is because the compiler in use is willing to wave off the
violation, not because it is correct.  (Or, potentially, it is adhering
to a different standard, though I'd be surprised if any other standard
differs in this respect.)

C99 6.2.2, "Linkages of identifiers"

   [#3]  If  the  declaration of a file scope identifier for an
   object or a function contains  the  storage-class  specifier
   static, the identifier has internal linkage.22)

(the 22 is a reference to a footnote that's about functions and thus
not relevant here).  So, the first declaration gives it internal
linkage.

   [#5] [...text about functions...
  ...]  If the declaration of an identifier for
   an object has file scope and no storage-class specifier, its
   linkage is external.

And the second declaration gives it external linkage.

   [#7]  If,  within  a  translation  unit, the same identifier
   appears  with  both  internal  and  external  linkage,   the
   behavior is undefined.

Hence, undefined.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: style(5) proposal: forbid extern in .c

2023-03-15 Thread Mouse
> [...] I wonder how we can resolve this case:

> extern struct netif_stats le_stats[];
> 
> static struct netif_dif le_ifs[] = {
> /*dif_unitdif_nseldif_stats   dif_private */
> { 0,  NLE0CONF,   _stats[0],   le0conf,},
> };
> #define NLE_IFS (sizeof(le_ifs) / sizeof(le_ifs[0]))
> 
> struct netif_stats le_stats[NLE_IFS];

That is not an extern declaration from a conceptual point of view; it's
just a forward declaration which happens to be implemented with a
(redundant) "extern" keyword.  (Even strict def/ref systems must
implement the tentative-definition paradigm for multiple declarations
within a single translation unit - C99 6.9.2, paragraph 2 in
particular.)

Still, I think it could be improved.

I would say that:

- If le_stats is - or might be, eg, depending on #defines - used from
   another file, the extern declaration should be moved into a .h used
   by both files.

- If not, the extern should be converted to static and the later
   declaration should have static added to it.

(I really wish C had chosen a different keyword for linkage control of
file-scope declarations; overloading static for storage duration
control at block scope and linkage control at file scope is ugly and
confusing.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: make(1): is this a bug or PEBKAC?

2023-02-26 Thread Mouse
>>> To make the makefile easier readable and at the same time avoiding
>>> line noise, you can use a multi-variable .for loop, which are
>>> available since 2000:
>> Maybe.  I'd have to see if that's present in all the make versions I
>> care about.  [...]
> It won't be in 1.4T's make but it should be in everything else you're
> likely to run into these days.

I no longer run 1.4T's make.  I got fed up with it and backported the
5.2 make to 1.4T on 2018-08-10.  It even works for the 1.4T build
infrastructure, though it generates more noise messages than the 1.4T
make did.

Mouse


Re: make(1): is this a bug or PEBKAC?

2023-02-22 Thread Mouse
>>> '$(x:...(...)$$)'
>> I don't see $$) anywhere in my lie above - nor anywhere else in
>> local-prog, for that matter.
> I extracted the '$$)' from your line 121 and minified it to the parts
> necessary to understand the parentheses counting parts.

It looks as though you also deleted parts, since line 121 is

$(x:C;^.*\.([0-9].*)$$;$(INSTMANDIR)/cat\1;):

and the only ways I see to get $$) out of that are to delete
";$(INSTMANDIR" or to delete ";$(INSTMANDIR)/cat\1;", neither of which
seems to me to match what you wrote

>> It counts the '$(' as starting a subexpression, then ignores the
>> following '(', and when it sees the first ')', it considers the
>> expression completed, leaving the '$$)' for a syntax error.

The only places I see $( there are at the very beginning and the
$(INSTMANDIR) in the replacement string, neither of which makes much
sense to me (and only the first of which even _has_ a following '(').
Thus my remark that I'd have to trace through the code to understand
what you mean.

> In case you want to build old versions of make,

On my own machines, I use the 5.2 and 4.0.1 makes, with 64-bit time_t
changes and, for the 4.0.1 make, a fix for := variable expansion.  (For
use on 1.4T, I backported the 5.2 make after one run-in too many with
the make variant it shipped with.)  I considered copying that to the
9.1 machine, to get consistency between that and what I run at home,
but figured I might be running into a bug, in which case it would
probably be worth alerting someone to.

> https://ftp.netbsd.org/pub/pkgsrc/misc/rillig/make-archive/

I may have a look at that at some point.  Thanks for the pointer!

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: make(1): is this a bug or PEBKAC?

2023-02-21 Thread Mouse
Thank you for the quick and very helpful response!

>> 121  $(x:C;^.*\.([0-9].*)$$;$(INSTMANDIR)/cat\1;):

>> So, I'm wondering, is this a bug in 9.1's make, or am I doing
>> something wrong here?

> TL;DR: There are several cases in which make silently ignores parse
> errors, such as '$$' in variable modifiers.  In other cases, it has
> become stricter over time, or the accepted syntax just changed.

:-(

> The crucial part in your example is the '$$'.

Actually, they were initially single $s.  I changed them in an attempt
to fix it after seeing the 9.1 manpage note that the pattern and
replacement were subject to variable expansion; I wasn't sure what a
trailing single $ would do, so I doubled it.

> In normal strings, '$$' is used to escape a '$'.  But in variable
> modifiers such as ':C', '\$' is used instead to escape a '$'.

Hm, is that documented somewhere?  The only place I see \$ in the 9.1
manpage is in an example in the description of :q.

> In netbsd-9:usr.bin/make/var.c, the parser looks at
> '$(x:...(...)$$)', trying to figure out which parentheses match, see
> the comment "Attempt to avoid ';' inside substitution patterns" in
> parse.c.  In this case, the parser fails.  It counts the '$(' as
> starting a subexpression, then ignores the following '(', and when it
> sees the first ')', it considers the expression completed, leaving
> the '$$)' for a syntax error.

I don't see $$) anywhere in my lie above - nor anywhere else in
local-prog, for that matter.  I'd have to go read the code, because
your description doesn't seem to match the results I see.

> To avoid all these edge cases, you should rewrite your makefile:

> * In the ':C' modifier, replace '$$' with '$', as there is no need to
> double it.

Okay.

> * In dependency lines, avoid modifiers containing ';'.

So my use of ; as the delimiter character is confusing things?  Okay, I
can change that.

> To make the makefile easier readable and at the same time avoiding
> line noise, you can use a multi-variable .for loop, which are
> available since 2000:

Maybe.  I'd have to see if that's present in all the make versions I
care about.  I use  on more than just work machines

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


make(1): is this a bug or PEBKAC?

2023-02-21 Thread Mouse
I'm trying to work on 9.1, for work.  Of course, one of the first
things I tried to do was install various of my tools.

make(1) appears to be misbehaving.  But I'm not sure enough it isn't
PEBKAC to just file a PR.

Specifically, I run my "bootstrap the tools" script and everything goes
normally until it tries to build a particular program, at which point

make: Unclosed substitution for  (; missing)
make: Unclosed substitution for  (; missing)
make: "/home/mouse/..local/lib/make/local-prog" line 115: Need an operator
make: Unclosed substitution for  (; missing)
make: Unclosed substitution for  (; missing)
make: "/home/mouse/..local/lib/make/local-prog" line 121: Need an operator
make: Unclosed substitution for  (; missing)
make: "/home/mouse/..local/lib/make/local-prog" line 125: Need an operator
make: Fatal errors encountered -- cannot continue

There are exactly two spaces between "for" and "(" on the "Unclosed
substitution" lines (verified with hexdump -C).  The Makefile in
question is

CC_ADD = -g
BUILDBINS = ccwrapper
OBJ_ccwrapper = ccwrapper.o
INSTALLMAN = ccwrapper.1

.MAIN: ccwrapper ccwrapper.cat1

.include 

and make is being run as...well, to let ktrace -t aceinsu tell it:

  3662  1 make NAMI  "/usr/bin/make"
  3662  1 make NAMI  "/usr/libexec/ld.elf_so"
  3662  1 make ARG   "/usr/bin/make"
  3662  1 make ARG   "-m"
  3662  1 make ARG   "/home/mouse/..local/lib/make"
  3662  1 make ARG   "-m"
  3662  1 make ARG   "/usr/share/mk"

(local-prog is the only thing in /home/.mouse/..local/lib/make.)

The relevant part of local-prog is (line numbers added, of course - the
whole file is on ftp.rodents-montreal.org in
/pub/mouse/local/src/makefiles/makefiles-20230221/local-prog in case
anyone wants the whole thing):

   109  .PHONY: install_dirs_MAN
   110  .for x in $(INSTALLMAN)
   111  install_files:: $(x:C;^(.*)\.([0-9].*)$$;$(INSTMANDIR)/cat\2/\1.0;)
   112  install_dirs:: install_dirs_MAN
   113  $(x:C;^(.*)\.([0-9].*)$$;$(INSTMANDIR)/cat\2/\1.0;):\
   114  $(x:C;^.*\.([0-9].*)$$;$(INSTMANDIR)/cat\1;)\
   115  $(x:C;\.([0-9].*)$$;.cat\1;)
   116  cp $(x:C;\.([0-9].*)$$;.cat\1;) $(.TARGET)
   117  CLEANFILES_ += $(x:C;\.([0-9].*)$$;.cat\1;)
   118  .if !exists($(x:C;^.*\.([0-9].*)$$;$(INSTMANDIR)/cat\1;)) && \
   119  !target($(x:C;^.*\.([0-9].*)$$;$(INSTMANDIR)/cat\1;))
   120  install_dirs_MAN:: $(x:C;^.*\.([0-9].*)$$;$(INSTMANDIR)/cat\1;)
   121  $(x:C;^.*\.([0-9].*)$$;$(INSTMANDIR)/cat\1;):
   122  mkdir -p $(.TARGET)
   123  .endif
   124  .if !target($(x:C;\.([0-9].*)$$;.cat\1;))
   125  $(x:C;\.([0-9].*)$$;.cat\1;): $(x)
   126  nroff -mandoc $(.ALLSRC) > $(.TARGET) || (rm -f $(.TARGET); 
false)
   127  .endif
   128  .endfor

The same code works with 5.2's make (which, of course, says only
moderately little for its correctness...).

So, I'm wondering, is this a bug in 9.1's make, or am I doing something
wrong here?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mail(1) editline defaults to vi mode?

2023-02-18 Thread Mouse
>> I think that input line editing, no matter how fancy, should be
>> invisible to simple applications, as invisible as erase and kill
>> processing is for applications that run in cooked mode today.  [...]

> Ultimately the problem is that teletypes are baked into the core
> architecture of Unix and all more sophisticated user interaction has
> been bolted on top instead of being redesigned in.

I agree and I disagree.

If by "teletypes" you mean "octet streams", well, it might be an
overstatement to say that "everything is an octet stream" is the
unifying concept that has made Unix Unix, but not by much.  I'm not
sure I'd call that "the problem", though.

The very notion of a command line owes a good deal to that unifying
principle, it seems to me.  I was never fortunate enough to use
interactive OSes designed without Unix available to draw on (while it
is just an assumption on my part that such exist, it seems likely).

If we accept, for the moment, the presence of an interactive command
line, one for which input line editing makes sense, it seems to me that
the problem is that some aspects of line editing are inherently
application-specific (history and expansion come to mind immediately)
and others aren't (cursor motion and inserting text come to mind).  The
reason this is a problem is that different philosophies for the
application-independent parts lead to different approaches to what kind
of UI is appropriate for the application-dependent parts.  We then need
an API design that allows applications to specify (and, in at least
some cases, participate in) the application-specific parts, somehow
doing this without imposing any particular user interface.

It's "mechanism, not policy" returning with a vengeance.

> By the time you fix it, it's not Unix any more.

Depending on what the "it" is.  If it's the octet-stream unifying
principle, then, yes, eliminating, or even substantially changing, that
will break what makes Unix Unix.

But, if we assume the existence of an interactive command line?  I'm
not so sure.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mail(1) editline defaults to vi mode?

2023-02-18 Thread Mouse
>>>> [...mail(1)...editline...vi mode...]
>> ps Mouse: no, this should not be in the kernel ... asking the kernel
>> to search back and get text from something you entered 3 weeks (and
>> several reboots) ago is just unreasonable.

That's (part of) why I said "as far as ordinary applications are
concerned".  As I said, I think it really should be in a user-specific
userland program, one the user can switch out for a different one, much
like the shell in that respect.

However, designing the interfaces in question is...a bit of a
challenge.  I've been mulling it over for years and still have nothing
but vague half-formed ideas.  The hard part, of course, is that, while
simple programs willbe happy with basic line editing, many others want
different things beyond that.  Many programs want to support various
additional commands beyond whatever the user considers basics.  And
then there's the question of what to do when a user's idea of "basics"
conflicts with what the program is trying to supply - witness, for
example, my own struggles with bash on Linux and ^P.

> I think Mouse refers to a thread on TUHS (where all the old hands of
> UNIX are, and some idiots, too, [...]).

Not specifically.  I am not familiar with TUHS in a practical sense.

I agree with the quote attributed to Rob Pike, but I was not aware of
it when I wrote my previous mail.

I think that input line editing, no matter how fancy, should be
invisible to simple applications, as invisible as erase and kill
processing is for applications that run in cooked mode today.
Applications that want frills, such as shells that want input history
that stretches back across multiple sessions, programs that want
completion of their own private command names, those will have to do
something to get them.  But then, they do now too; it's just a question
of designing the right interfaces.

But even that much is no easy task.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: mail(1) editline defaults to vi mode?

2023-02-17 Thread Mouse
> I had to use mail(1) recently (after ~30 years break :).  To my
> surp^Wconfusion its editline inits itself to vi mode.

The real problem, here, is that line editing is being done in the wrong
place.  It should be done in the kernel as far as ordinary applications
are concerned, though it really should be in a user-specific userland
program (which is thus switchable, same as the shell is).

Admittedly, this is Not Simple to do right.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: split(1): add '-c' to continue creating files

2023-02-14 Thread Mouse
>> Besides, isn't your intended behaviour easily done with:

>>  $ cat file second-file | split

> That only works if I have both files available at the time I run the
> split command.

It also will (unless the first file is a multiple of the split size)
take the last part of file and the first part of second-file and put
them in the same split file.  It also imposes the same split size on
both input files.

In contrast, the proposed behaviour never puts pieces from different
input files in the same output file and permits different fragment
sizes for different input files.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: split(1): add '-c' to continue creating files

2023-02-14 Thread Mouse
> $ split -n 4 -c file; ls
> xaa xab xac xad xae xaf xag
> --- --- --- ---

> I don't see a way around that: split(1) would need to look ahead at
> _any_ possible file to be able to determine if the current file name
> falls into a hole in the sequence.

That isn't that hard to do, assuming the containing directory is
readable to the user running split, though there is still a race if two
split instances are writing with the same prefix in the same directory.
But _that_ race is pretty much unavoidable.

> If you think it's worth calling out, we could try to do so in the
> manual page: [...]

Could be worth doing.  Perhaps split could watch for this and (possibly
optionally) warn to stderr if it happens?

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: jot: producing too much output?

2023-01-06 Thread Mouse
>> [...jot...]
> The original code (see rev 1.1) seems to be more clear on how the
> parameters should be evaluated.

Yes.  Arguably the right thing to do here is to do the analog of what
that version does and list all 16 possible combinations of START, REPS,
ENDER, and STEP, with explicit code for each.

But then, the jot NetBSD already has conforms precisely to its spec.
Perhaps this could be viewed as a bug in the spec.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: jot: producing too much output?

2023-01-06 Thread Mouse
>>> I'd do it myself except I don't know where to find an authoritative
>>> spec for jot.

>> I'd tend to assume it is the code.

Well, then, there never can be any such thing as a bug in jot.

I find that...dubious.

> Remark from the peanut gallery, after (again) reading this in jot(1):

> | The name jot derives in part from iota, a function in APL.

> Maybe somebody(tm) should look that up.  Or ask John A. Kunze.

I know enough APL to make some sense of that, and it's not very
informative here.

Caveat: my APL knowledge is relatively old.  But I suspect the jot
name's derivation is comparably old.  So, as I understand APL

There's an operator - pronounced "iota", because the usual glyph for it
is the lowercase Greek letter ι - which takes a scalar, let's call it
N, and returns a 1D array from 0 through N-1.  There is an APL
character which I'm not sure of the location of in non-APL-specific
fonts; it looks roughly like a degree sign moved down so its centre is
approximately where a centered-dot's centre would be.  It is pronounced
"jot".  I don't see how to get to jot(1)'s functionality from APL's
jot, but it is in some respects like APL's iota.

However, that doesn't help much here because APL's ι operator doesn't
have the issues under discussion here.  It takes exactly one argument;
in jot(1) terms, start is 0, step is 1, and the argument is reps, with
end being implicit in the other three.  If you want anything else, you
have to construct it from the vector ι returns.  For example, if you
want the vector 2.4 2.5 2.6 2.7 2.8 2.9, you would probably do
something like...well, I'm not sure I'm going to get the MIME right,
but I think 2.4+.1×ι6 is what you'd want.

APL ι does have dyadic semantics as well, but they're even less
relevant.  (Most briefly, XιY returns an array of the same shape as Y,
indicating where in X the elements of Y are first found.  If X is not
1D, I'm not sure what happens; I suspect it actually operates on ,X
instead of X as given, though it seems to me the most philosophically
APLish thing would be to add another dimension ρρX to the result and
return entire subscript vectors.)

I don't see any help in APL for figuring out what jot(1) should do.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: jot: producing too much output?

2023-01-05 Thread Mouse
> Actually, replace that last patch with this one.   The previous one
> wouldn't work if begin and end were the same value (intending to get
> just one output value) - though why anyone would write the args that
> way instead of just saying 1 reps begin at ... I cannot imagine.

Perhaps because the values weren't hand-specified but came from
somewhere automated in a script?  It's a bit like saying "why would
anyone run `cat /dev/null'".

    Mouse


Re: jot: producing too much output?

2023-01-05 Thread Mouse
> As an extra bonus, the appended patch causes jot to generate what
> you're expecting.

True.  But I don't understand why the code isn't instead something more
like

if (! (have & REPS)) reps = 100;
if (! (have & BEGIN)) begin = 1;
if (! (have & ENDER)) ender = 100;
if (! (have & STEP)) step = 1;
...compute t and possibly reduce reps...

I am reminded of the aphorism that there are two ways to build a
system: you can make it so complex that there are no obvious defects,
or you can make it so simple there are obviously no defects.  The code
there now strikes me as closer to the former.

At the very least I think someone who knows jot's actual spec should
check that all 16 possible values of have produce their correct
behaviours.  I'd do it myself except I don't know where to find an
authoritative spec for jot.  What seems obvious to me is apparently not
a reliable guide in this case.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: jot: producing too much output?

2023-01-05 Thread Mouse
>> jot -s "" - 32 126

>> Based on reading the manpage, I would expect that to produce a
>> single line 323334353637...124125126.  It does on my 1.4T.  On my
>> 5.2, ftp.n.o's 9.0_STABLE, and a work 9.1 machine, the line actually
>> ends ...125126127128129130131.

> The man page actually says:

>  If fewer than three [of the reps begin end & step args] are specified,
>  defaults are assigned left to right, except for s, which assumes its
>  default unless both begin and end are given.

Right.  So I would expect "- 32 126" to behave like "100 32 126 1".
And text which you didn't quote says "If four [args] are specified and
the given and computed values of reps conflict, the lower value is
used".  That is not directly applicable because only two values were
specified, but it does seem reasonable to me to expect the reps value
computed from the given begin and end and the defaulted step - meaning
95 in this case - to be used.  Forcing users to compute the number of
values they want when specifying both begin and end strikes me as
counterintuitive at best.

In any case, it's of mostly academic interest to me personally; I
normally use count(1), a program of my own that defaults in ways that
make more sense to me.  I just had occasion to want to generate
sequential data on a machine that didn't have my suite of utilities set
up, leading me to go looking.  When jot didn't give I wanted, I wrote
shell loop (i=BEGIN; while [ $i -lt END ]; do BODY; i=$(($i + 1)); done
or something like it).

But it did look to me as though jot wasn't behaving the way I thought
the manpage said it should, hence my mail.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


jot: producing too much output?

2023-01-05 Thread Mouse
jot -s "" - 32 126

Based on reading the manpage, I would expect that to produce a single
line 323334353637...124125126.  It does on my 1.4T.  On my 5.2,
ftp.n.o's 9.0_STABLE, and a work 9.1 machine, the line actually ends
...125126127128129130131.

Am I misunderstanding how the arguments work, or is this a bug?  Or
does it not do this for other people?

(The 1.4T jot has a different bug; jot -s "" -c - 32 126, or the same
thing with -c replaced with -w %c, produces a line

  
!"#$%&'()*+,-./01223456789:;<=>?@ABCDEEFGHIJKLMNOPQRSTUVWXXYZ[\]^_`abcdefghijkklmnopqrstuvwxyz{|}}

containing what I would expect but with the first and last characters
duplicated.  Looking into that is what led me to discover the above
(mis?)behaviour without -c/-w.  5.2 and 9.1 jot with -c produce the
same sequence as without -c, as single bytes instead of decimal.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: setreuid(2)?

2022-11-09 Thread Mouse
> I don't use seteuid(), but, it looks like you can just re-swap the
> (uid, euid) in a child binary and regain the parent's privileges.

Probably.  What does that have to do with what I was after?  In my
case, the process forks but does not exec; there is only one executable
involved.

I didn't explain _why_ I reached for setreuid.  I have been working on
a new kernel facility - basically, something a bit like sockets, but
using PIDs as addresses.  (I first started out planning on adding
AF_PID sockets, but it turns out that, at least in the version I've
been working under, that's not suitable - the AF-independent socket
machinery does too much.  In particular, it keeps a queue of pending
connections per socket, and has no way to differentiate between two
processes with file descriptors on the same socket.  So I'm using
DTYPE_MISC, my own f_ops vector, and a dedicated syscall.  But, except
as my motivation for wanting setreuid(), that's off-topic here.)

One thing I wanted to add was something a bit like LOCAL_PEEREID.  To
test it, I wanted two processes, each with a distinctive ruid and euid
(four different values across the two processes).  I couldn't see any
way to achieve this with less than setreuid() - at least not without
creating an executable setuid to one of the IDs and then running it
from the other, and the repeating all that for the other two UIDs.
This would have been seriously inconvenient, especially as compared to
setreuid().

The only issue, turns out, was that the manpage overstated the degree
to which setreuid() is obsolete.  I tried to file a PR about that, but
the mail bounced, apparently looping inside NetBSD mail infrastructure.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: setreuid(2)?

2022-11-08 Thread Mouse
I wrote

> OK, so the real problem is that the [setreuid] manpage overstates the
> case for its obsolescence.  I'll file a PR.

Apparently I won't.  My mail bounced "too many hops"; there seems to be
a loop somewhere in there.  If someone can tell me a useful place to
send it, I can pass the bounce along.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: setreuid(2)?

2022-11-08 Thread Mouse
>> What am I missing?

> Nothing.

> Using the saved id's is only an alternative ([...]) if the sole aim
> of using setreuid() ([...]) is to allow a setuid process perform some
> operations as the real uid, and then revert to the effective uid once
> those are done.

OK, so the real problem is that the manpage overstates the case for its
obsolescence.  I'll file a PR.

> For what you need (which is somewhat unusual) only setreuid() will
> work - or until we add [sg]res[ug]id() [...]

Yes, it's unusual.  It's for testing.  I've just designed (and
implemented a basic version of) something a bit like sockets where
addresses are process IDs.  (Sockets won't actually work without
significant internal overhaul, at least in the OS versions I'm working
with, so I'm going with DTYPE_MISC and a custom ops vector.)  One of
the things I wanted was something a bit like AF_LOCAL's LOCAL_PEEREID,
so to test it I wanted each process to have distinctive UID and EUID.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


setreuid(2)?

2022-11-07 Thread Mouse
I have a program, running with ruid=euid=0, that wants to set its real
and effective IDs to two other, different, IDs, neither one privileged.

What is the proper way to do this?  I first reached for setreuid(2),
but its manpage says that it is "made obsolete" by the saved-ID
functionality of setuid(2) and seteuid(2) and that it "should not be
used in new code".  But I must be missing something, because I can't
see any way to exploit the functionality described there, including the
saved IDs, to get the effect I want...short of creating an executable
setuid to one of the IDs, then switching to the other and execing that
executable.  I would hardly say this makes setreuid() obsolete, since
it requires writable filesystem space with set-ID functionality turned
on, a whole lot more syscalls, *and* MD code to construct a suitable
executable, none of which setreuid() needs to do the same job.

What am I missing?

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: strftime(3) oddities with %s, %z

2022-11-07 Thread Mouse
>> There is nothing in the C99 description of mktime (or POSIX either)
>> that specifies which fields [strftime] uses,

Unless it was removed after the very-late-draft I have, there is.  See
my earlier mail in this thread which quoted 7.23.3.5.  This says,
among other things,

The
   appropriate characters  are  determined  using  the  LC_TIME
   category  of the current locale and by the values of zero or
   more members of the broken-down time structure pointed to by
   timeptr,  as  specified  in brackets in the description.

and most[$] conversion has field names specified, such as

   %B  is replaced by the locale's full month name.  [tm_mon]

[$] A few have neither field names nor brackets, such as %n ("is
replaced by a new-line character").  Personally I would have preferred
to have an empty pair of brackets there, but oh well.

/~\ The ASCII     Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: strftime(3) oddities with %s, %z

2022-11-06 Thread Mouse
>> I don't think that's needed for what uwe@ and dholland@ were looking
>> for, which -- if I understand correctly -- is that:

>|  gmtime_r(, );
>|  localtime_r(, );
>|  strftime_XXX(gbuf, sizeof(gbuf), "... %s ...", );
>|  strftime_XXX(lbuf, sizeof(lbuf), "... %s ...", );

>| should format the same number at %s in the output

> That simply isn't possible, or not and retain [compatibility with
> existing and apparently-soon-to-be-codified practice].

Which is why I, and I think at least one or two of the others (though
in very different words), have been calling the existing API broken: it
cannot do the naïvely-obvious thing, a perfectly reasonable thing from
a conceptual point of view, a thing which I suspect is what would be
expected by someone who deosn't understand enough details to know just
how Byzantinely broken this is.  Before reading this thread, I would
have expected something like the above to work.  (Probably because I
haven't tried to use struct tm for timezone-aware code and thus haven't
run into this trainwreck before.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: strftime(3) oddities with %s, %z

2022-11-05 Thread Mouse
ed by the month, using the locale's alternative
   numeric symbols.
   %OM is   replaced   by   the  minutes,  using  the  locale's
   alternative numeric symbols.
   %OS is  replaced  by  the  seconds,   using   the   locale's
   alternative numeric symbols.
   %Ou is  replaced  by the ISO 8601 weekday as a number in the
   locale's alternative representation, where Monday is  1.
   %OU is  replaced  by  the  week  number,  using the locale's
   alternative numeric symbols.
   %OV is replaced by  the  ISO 8601  week  number,  using  the
   locale's alternative numeric symbols.
   %Ow is  replaced  by  the  weekday  as  a  number, using the
   locale's alternative numeric symbols.
   %OW is replaced by the week number of the  year,  using  the
   locale's alternative numeric symbols.
   %Oy is  replaced by the last 2 digits of the year, using the
   locale's alternative numeric symbols.

   [#5] %g, %G, and %V give values according  to  the  ISO 8601
   week-based  year.   In  this system, weeks begin on a Monday
   and week 1 of the year is the  week  that  includes  January
   4th, which is also the week that includes the first Thursday
   of the year, and is also the first  week  that  contains  at
   least four days in the year.  If the first Monday of January
   is the 2nd, 3rd, or 4th, the preceding days are part of  the
   last  week  of  the  preceding  year; thus, for Saturday 2nd
   January 1999, %G is replaced by 1998 and %V is  replaced  by
   53.  If December 29th, 30th, or 31st is a Monday, it and any
   following days are part of week 1  of  the  following  year.
   Thus, for Tuesday 30th December 1997, %G is replaced by 1998
   and %V is replaced by 1.

   [#6] If a conversion specifier is not one of the above,  the
   behavior is undefined.

   [#7]  In  the  "C" locale, the E and O modifiers are ignored
   and the replacement strings  for  the  following  specifiers
   are:

   %a  the first three characters of %A.
   %A  one of ``Sunday'', ``Monday'', ... , ``Saturday''.
   %b  the first three characters of %B.
   %B  one of ``January'', ``February'', ... , ``December''.
   %c  equivalent to ``%A %B %d %T %Y''.
   %p  one of ``am'' or ``pm''.
   %r  equivalent to ``%I:%M:%S %p''.
   %x  equivalent to ``%A %B %d %Y''.
   %X  equivalent to %T.
   %Z  implementation-defined.

   Returns

   [#8]  If  the total number of resulting characters including
   the terminating null character is not more than maxsize, the
   strftime  function  returns  the number of characters placed
   into the array pointed to by s not including the terminating
   null   character.   Otherwise,  zero  is  returned  and  the
   contents of the array are indeterminate.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: strftime(3) oddities with %s, %z

2022-11-02 Thread Mouse
>> Unless POSIX was stupid enough to mandate that all-bits-0 is nil for
>> any pointer type and something well-defined for floating-point.
> The former is definetely true.  (Or will be.)

Sad.  Well, their mistake.

> And i think on the TZ list it just came up it is generally true for
> all "modern" machines.

Oh, it is, it is, especially since the current definition of "modern"
seems to be "Linux/x86_64". :-þ

Portability is not - or at least in my opinion should not be - about
"does it run on most machines now" but "will it run on tomorrow's weird
new machine".  This is why I don't generate/parse wire protocol by
overlaying structs onto octet streams.  This is why I don't bzero
structs with pointers and floats, even though every machine I either
use now or expect to use uses all-0-bits for nil pointers and zero
floats.  I want my code to be an instance of "What did you have to do
to port it to the new system?" "We typed `make'.".

Of course, it's not quite that simple in full generality.  But that's
the aspect that I see as relevant to this thread.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: strftime(3) oddities with %s, %z

2022-11-02 Thread Mouse
> Suppose you create a struct tm _without_ gmtime(3) or localtime(3),
> using designated initializers or memset for zero-initialization, with
> only what is included in POSIX:

> struct tm tm = {
>   .tm_sec = 56,
>   .tm_min = 34,
>   .tm_hour = 12,
>   .tm_mday = 1,
>   .tm_mon = 12 - 1,   /* December */
>   .tm_year = 2021 - 1900,
>   .tm_wday = 3,   /* Wednesday */
>   .tm_yday = 334, /* zero-based day of year (%j - 1) */
>   .tm_isdst = 0,
> };

This is fine.  But using memset is not; if struct tm contains a pointer
or a floating-point value, setting it to all-0-bits may produce a trap
representation - or, possibly worse, a valid value that means something
different from what you intend.

Unless POSIX was stupid enough to mandate that all-bits-0 is nil for
any pointer type and something well-defined for floating-point.  (I'd
be surprised by that, but standards bodies have surprised me often
enough in the past.)  Certainly C doesn't, at least not as of C99 - I
don't have a copy of anything newer.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


  1   2   3   >