multibyte mail(1), was: Epilogue

2023-10-12 Thread Ingo Schwarze
Hi Walter,

Walter Alejandro Iglesias wrote on Thu, Oct 12, 2023 at 09:42:33AM +0200:

> Given that I'm the OP of this thread

It seems you mean the thread "mail(1) MIME support".

> I feel entitled to officially closing it.

I have no idea what "closing" means.

> Those interested in my proposal will find my latest version of the
> patch in this new thread:
> 
>   https://marc.info/?t=16961663101=1=2
> 
> For my part, what was said in this thread has already been assimilated
> and overcome.  As my patches show, I heard and took into account every
> correction and suggestion (even though many, if not most, regardless of
> the intention with which they were made, functioned more as a boycott
> than as a contribution.)

You misunderstand.

The reason that at some point i temporarily paused feedback is that
such changes cannot be committed closely before release.  Meanwhile,
the tree is open again for 7.4-current development, but even if we
had finished patches ready for commit, i would likely delay committing
them for another week or two until snapshot building for 7.4-current
resumes such that the committed patches get real-world testing as soon
as possible.  On top of that, several developers, including myself,
are still involved in work related to making the release happen, so
the time available for post-release work is not unlimited - and as you
can see from the current, intense flurry of commits, in particular
in ports, that much of the time available for post-release work has
already been allocated to stuff that developers had lying around in
their personal pipelines, which also limits the time available for
completely new stuff like multibyte mail(1).

It is well-known that developers exist who frequently use mail(1)
so *if* whatever we end up doing disrupts their processes, we have
to find out as quickly as possible and take corrective action
without further delay.  Delaying the work on patches until the
right time is not a problem.  Committing something in a hurry
and not immediately getting it tested in real-world use may be
a problem depending on what and how important it is.

On top of that, we still haven't achieved consensus among
developers exactly what is the best first step.  We are getting
closer, and i'm quite sure we will ultimately add some MIME
support, but some decisions about desired functionality still
need to be made, for example how much automation is desirable
and acceptable and what the defaults should be.

Changing the core functionality of a program that is 52 years old and
has been standardized for probably almost 30 years, or maybe even
for up to 35 years, is not a quick and easy matter.  It needs proper
consideration, caution, time, and patience.

Yours,
  Ingo



Re: /bin/ls print format bugs

2023-10-05 Thread Ingo Schwarze
Hi Crystal,

Crystal Kolipe wrote on Tue, Oct 03, 2023 at 06:47:42PM -0300:

> Two display format bugs seems to have crept in to ls due to the
> addition of scaled human readable values and large minor numbers.

I think you are right that the current situation isn't good.
Thank you for bringing this up.

In general, ls(1) strives to dynamically determine the required
column width.  It already does that for the file size column.
Given that device numbers use the same column, i think the solution
that is cleanest, most robust, most aesthetically pleasing, least
wasteful with respect to space, and easiest to maintain is simply
doing the same for device numbers, see the patch below.

Two remarks:
 * The reason for keeping the local variable "bcfile" is that
   (0, 0) represents a valid device (console).  There is no
   good reason why it should be in the DISPLAY struct, though.
 * The line "d.s_major = d.s_minor = 3;" is likely redundant because
   with bcfile == 0, nothing is supposed to use these fields.
   While local variables should only be initialized when needed, i
   think defensive programming recommends keeping data that is used
   by (even internal) APIs initialized to reasonable values rather
   than having random memory content lying around.  That reduces
   the risk of future code changes in *other* modules accidentally
   accessing random data, if developers get confused regarding
   which invariants the internal API guarantees or requires.

OK?
  Ingo


   $ cd /dev
   $ ls -l MAKEDEV *vid*  
  -r-xr-xr-x  1 root  wheel  12254 Oct  3 07:07 MAKEDEV
  lrwxr-xr-x  1 root  wheel  6 Aug 29  2016 video -> video0
  crw---  1 root  wheel  44, 0 Oct  3 13:34 video0
  crw---  1 root  wheel  44, 1 Oct  3 13:34 video1
   $ ls -lh MAKEDEV *vid*
  -r-xr-xr-x  1 root  wheel  12.0K Oct  3 07:07 MAKEDEV
  lrwxr-xr-x  1 root  wheel 6B Aug 29  2016 video -> video0
  crw---  1 root  wheel  44, 0 Oct  3 13:34 video0
  crw---  1 root  wheel  44, 1 Oct  3 13:34 video1
   $ ls -l null
  crw-rw-rw-  1 root  wheel  2, 2 Oct  5 00:32 null
   $ ls -l cua00  
  crw-rw  1 root  dialer  8, 128 Oct  3 13:34 cua00


Index: ls.c
===
RCS file: /cvs/src/bin/ls/ls.c,v
retrieving revision 1.54
diff -u -p -r1.54 ls.c
--- ls.c7 Oct 2020 21:03:09 -   1.54
+++ ls.c5 Oct 2023 14:47:37 -
@@ -436,6 +436,7 @@ display(FTSENT *p, FTSENT *list)
unsigned long long btotal;
blkcnt_t maxblock;
ino_t maxinode;
+   unsigned int maxmajor, maxminor;
int bcfile, flen, glen, ulen, maxflags, maxgroup, maxuser, maxlen;
int entries, needstats;
int width;
@@ -449,6 +450,7 @@ display(FTSENT *p, FTSENT *list)
btotal = maxblock = maxinode = maxlen = maxnlink = 0;
bcfile = 0;
maxuser = maxgroup = maxflags = 0;
+   maxmajor = maxminor = 0;
maxsize = 0;
for (cur = list, entries = 0; cur != NULL; cur = cur->fts_link) {
if (cur->fts_info == FTS_ERR || cur->fts_info == FTS_NS) {
@@ -523,9 +525,13 @@ display(FTSENT *p, FTSENT *list)
(void)strlcpy(np->group, group, glen + 1);
 
if (S_ISCHR(sp->st_mode) ||
-   S_ISBLK(sp->st_mode))
+   S_ISBLK(sp->st_mode)) {
bcfile = 1;
-
+   if (maxmajor < major(sp->st_rdev))
+   maxmajor = major(sp->st_rdev);
+   if (maxminor < minor(sp->st_rdev))
+   maxminor = minor(sp->st_rdev);
+   }
if (f_flags) {
np->flags = >data[ulen + 1 + glen + 
1];
(void)strlcpy(np->flags, flags, flen + 
1);
@@ -551,7 +557,6 @@ display(FTSENT *p, FTSENT *list)
d.entries = entries;
d.maxlen = maxlen;
if (needstats) {
-   d.bcfile = bcfile;
d.btotal = btotal;
(void)snprintf(buf, sizeof(buf), "%llu",
(unsigned long long)maxblock);
@@ -570,6 +575,17 @@ display(FTSENT *p, FTSENT *list)
d.s_size = strlen(buf);
} else
d.s_size = FMT_SCALED_STRSIZE-2; /* no - or '\0' */
+   d.s_major = d.s_minor = 3;
+   if (bcfile) {
+   (void)snprintf(buf, sizeof(buf), "%u", maxmajor);
+   d.s_major = strlen(buf);
+   (void)snprintf(buf, sizeof(buf), "%u", maxminor);
+   d.s_minor = strlen(buf);
+   if (d.s_size <= d.s_major + 2 + d.s_minor)
+   

Re: List tracepoints directly in kdump.1

2023-09-28 Thread Ingo Schwarze
Hi Christian,

Christian Weisgerber wrote on Thu, Sep 28, 2023 at 07:01:29PM +0200:

> It's always the same:
> * foobar doesn't behave as expected
> * I ktrace foobar
> * I run kdump... too much information.
> * I check the kdump(1) man page, since I can't remember which letter
>   represents which tracepoint.
> * "See the -t option of ktrace(1) for the meaning of the letters."
> 
> Sigh.  Yes, duplication is evil, yada, yada, but would it really
> be so bad to also list the tracepoints directly in kdump.1 instead
> of pointing to ktrace.1?

occasionally, small amounts of duplication are tolerable if they really
make life easier for users.  This may be such a case, in particular
considering that the -t option is likely more commonly used with kdump(1)
than with ktrace(1), except in the rare case that the dump becomes so big
that you already have to limit output at ktrace(1) time.

The file /usr/src/usr.bin/ktrace/subr.c already contains a comment
instructing developers to keep ktrace(1), kdump(1), and ltrace(1)
up to date when changing option letters.

Maybe you want to include comments in ktrace.1, kdump.1, and ltrace.1
too, for example like this - your call:

  .Bl -tag -width flag -offset indent -compact
  .\" Keep this list in sync with ktrace(1) and ltrace(1).
  .It Cm c

It doesn't guarantee that we will stay in sync, but maybe it improves
our chances further, in particular when people make editorial changes
to the manuals without changing the code.

Either way, OK schwarze@ if no developer objects in a reasonable
timeframe.

Yours,
  Ingo


> ---
> commit 6537a30531732808760afdc5dcd7331aeb9d7618 (local)
> from: Christian Weisgerber 
> date: Thu Sep 28 16:46:15 2023 UTC
>  
>  list tracepoints directly in kdump.1 instead of pointing to ktrace.1
>  
> diff 1cd585e18a4dcd40dd8e9c339b74dc79fd0856e2 
> 6537a30531732808760afdc5dcd7331aeb9d7618
> commit - 1cd585e18a4dcd40dd8e9c339b74dc79fd0856e2
> commit + 6537a30531732808760afdc5dcd7331aeb9d7618
> blob - 936c630a1096cec78204ac075c6df83a2744e8e2
> blob + 5f073e48510388f26cf5c0c1b66d360dc3332364
> --- usr.bin/kdump/kdump.1
> +++ usr.bin/kdump/kdump.1
> @@ -100,13 +100,38 @@ Display absolute timestamps for each entry (seconds si
>  If both options are specified, display timestamps relative to trace start.
>  .It Fl t Ar trstr
>  Select which tracepoints to display.
> -The argument can contain one or more of the letters
> -.Cm cinpstuxX+ .
> -See the
> -.Fl t
> -option of
> -.Xr ktrace 1
> -for the meaning of the letters.
> +The argument can contain one or more of the following letters.
> +By default all trace points except for
> +.Cm X
> +are enabled.
> +.Pp
> +.Bl -tag -width flag -offset indent -compact
> +.It Cm c
> +trace system calls
> +.It Cm i
> +trace I/O
> +.It Cm n
> +trace namei translations
> +.It Cm p
> +trace violation of
> +.Xr pledge 2
> +restrictions
> +.It Cm s
> +trace signal processing
> +.It Cm t
> +trace various structures
> +.It Cm u
> +trace user data coming from
> +.Xr utrace 2
> +.It Cm x
> +trace argument vector in
> +.Xr execve 2
> +.It Cm X
> +trace environment in
> +.Xr execve 2
> +.It Cm +
> +trace the default points
> +.El
>  .It Fl u Ar label
>  Display
>  .Xr utrace 2



Re: Fix function names in imsg_init.3

2023-09-28 Thread Ingo Schwarze
Hi Lucas,

Lucas wrote on Thu, Sep 28, 2023 at 04:07:02PM +:

> There is no imsg_seek_set_n{32,64}, but imsg_set_n{32,64}.

Committed, thanks.
  Ingo

> diff refs/heads/master 34767f41b5371661bc7d3b47c3f780279d1bcd9c
> commit - c7bb30c9e72387bdcf13f2516a8d63c49f7eae54
> commit + 34767f41b5371661bc7d3b47c3f780279d1bcd9c
> blob - 11915f377f9b38df97bd67ca9b1768962a998637
> blob + db3021ed6c199c4fef8f544313894999fe1b4380
> --- lib/libutil/imsg_init.3
> +++ lib/libutil/imsg_init.3
> @@ -472,9 +472,9 @@ with the data of extent
>  .Pp
>  .Fn ibuf_set_n8 ,
>  .Fn ibuf_set_n16 ,
> -.Fn ibuf_seek_set_n32
> +.Fn ibuf_set_n32
>  and
> -.Fn ibuf_seek_set_n64
> +.Fn ibuf_set_n64
>  replace a 1-byte, 2-byte, 4-byte or 8-byte
>  .Fa value
>  at offset



Re: Buffer overflow in /usr/bin/deroff

2023-09-28 Thread Ingo Schwarze
Hi,

up front, thanks for finding and fixing this and sorry for not coming
round to testing the patch before commit.

Crystal Kolipe wrote on Wed, Sep 27, 2023 at 06:04:01PM -0300:
> On Wed, Sep 27, 2023 at 02:05:14PM -0600, Todd C. Miller wrote:

>> As written, deroff will not emit a line that does not end with a
>> newline.  That could be changed in a subsequent commit if it so
>> desired.

> That's always been the behaviour in every version that I've seen.
> It does have the possibly unexpected effect that /usr/bin/spell
> silently ignores the last line of input if it's not newline terminated.

Well, if a file does not end in a newline character, it is not a text
file according to POSIX.  Similarly, if a file contains a sequence
of LINE_MAX bytes and none of those bytes is a newline, it is not a
text file either.  The behaviour of troff(1), nroff(1), and deroff(1)
is undefined on non-text files.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html

Definition "3.403 Text File":

  A file that contains characters organized into zero or more
  lines. The lines do not contain NUL characters and none can
  exceed {LINE_MAX} bytes in length, including the 
  character. Although POSIX.1-2017 does not distinguish between text
  files and binary files (see the ISO C standard), many utilities
  only produce predictable or meaningful output when operating on
  text files. The standard utilities that have such restrictions
  always specify "text files" in their STDIN or INPUT FILES sections.

All that said, i do agree that our text-handling tools should handle
non-text files gracefully whereever that is possible without excessive
effort.  But it isn't really surprising that an extremely old, almost
obsolete utility like deroff(1) relies on these rules.

> Whichever way we decide on for this, the future regression test

I certainly don't object that you commit a regression test you already
have, but systematically writing more regressions tests looks like a
waste of time to me.  That time would be better spent figuring out
how we can finally delete deroff(1) from the tree.  That has been
discussed in the past, but sadly never came to fruition.  The main
obstacle to overcome is that spell(1) relies on it - but nothing else
as far as i remember.

Note that while using deroff(1) in spell(1) kind of works, it is *not*
a good solution, and has never been, it's an awful hack and always was.
deroff(1) is an ad-hoc simplified parser for various complex languages,
so it will almost certainly misparse some valid roff(7) and high-level
macro constructions, both missing some relevant parts of the input
text and inserting bogus words that have nothing to do with the text
contained in the input into the output stream.  The error rate is low
enough that it still kind of works for spell(1), especially considering
that a tool like spell(1) also has its own, quite significant rate of
both false negative and false positive misclassifications.  But none
of that makes the basic approach a good idea.

On top of that, the source code of deroff(1) looks badly obfuscated.
Every language, including roff(7) and the macro sets on top of it,
evolves over time, and i fail to see any reasoable way to maintain
the mess inside deroff.c during such evolution.  I guess no one
even tried to maintain it for roughly 30 years now.  What is the
point of regression tests for code that no one ever changes?

The proper way to improve spell(1) would probably be to use
nroff -Tascii output instead of deroff(1).  Now, we don't have
nroff(1) in base.  For the case of -mdoc, -man, and -mandoc, using
mandoc(1) -Tascii would be a significant improvement over using
deroff(1).  For the cases of other ancient macrosets like -me, -mm, -ms
that we no longer support in base, asking people to install
the textproc/groff port to support them seems reasonable to me.
Why should base be able to run spell(1) on a file that base
cannot format in the first place?

For processing a modern macro set like -mom, you already have to
install groff(1) right now - deroff(1) is going to yield very
poor results with that.

Still, some work is needed in this direction before we may become
able to send deroff(1) to its well-earned retirement.

Oh, and before you ask, the demandoc(1) still contained in the
portable mandoc distribution, which we never had in base, should
also be deleted from portable mandoc.  It is also a fundamentally
flawed idea and totally unmaintained for over a decade.  Simply
using mandoc -Tascii is the way to go even in portable mandoc.

Yours,
  Ingo



Re: Send international text with mail(1) - proposal and patches

2023-09-21 Thread Ingo Schwarze
Hi,

i fear this is getting a bit out of hand...

Stefan Sperling wrote on Thu, Sep 21, 2023 at 02:12:50PM +0200:
> On Thu, Sep 21, 2023 at 01:25:01PM +0200, Walter Alejandro Iglesias wrote:

>> I corrected many of the things you pointed me, but not all.  The
>> function I use to check utf8 is mine, I use it in a pair of little
>> programs which I've *hardly* checked for memory leacks.  I know my
>> function looks BIG :-), but I know for sure that it does the job.

> We already have code in libc that does this, see the function
> _citrus_utf8_ctype_mbrtowc in lib/libc/citrus/citrus_utf8.c.
> Please use the libc interface if at all possible, it is best to
> have just one place to fix when a UTF-8 parser bug is found.

In general, the tool for checking the validity of UTF-8 strings
is a simple loop around mblen(3) if you want to report the precise
positions of errors found, or simply mbstowcs(3) with a NULL pwcs
argument if you are content with a one-bit "valid" or "invalid" answer.

But checking the validity of UTF-8 is probably beyond the scope of a
simple tool like mail(1), i think.  All i suggested was checking the
validity of US-ASCII when that encoding is selected - in a separate
patch to be considered *after* support for the MIME headers has gone in.

As Stefan says, adding a hand-written UTF-8 parser to mail(1) is
clearly not acceptable.

Yours,
  Ingo



Re: Send international text with mail(1) - proposal and patches

2023-09-20 Thread Ingo Schwarze
Hi,

i checked the following points:

 * Even though RFC 2049 section 2 bullet point 1 only *requires*
   MIME-conformant MUAs to always write the header "MIME-Version:
   1.0" - and mail(1) is most certainly not MIME-conformant - RFC 2049
   section 2 bullet point 8 explicitly *recommends* that even non-MIME
   MUAs always set appropriate MIME headers.  RFC 2046 section 4.1.2
   paragraph 8 also "strongly" recommends the explicit inclusion of a
   "charset" parameter even for us-ascii.

   Consequently, i believe that when sending a message in US-ASCII,
   mail(1) should include these headers:

   MIME-Version: 1.0
   Content-Transfer-Encoding: 7bit
   Content-Type: text/plain; charset=us-ascii

 * Adding a "Content-Transfer-Encoding: ..." header is indeed required
   for sending UTF-8 messages, see  RFC 2049 section 2 bullet point 2.
   "8bit" is one of the valid values that MUAs must support for
   receiving messages by default.
   Using it seems sane because it is most likely to work with receiving
   MUAs that are not MIME-conformant, like our mail(1) itself.
   I think nowadays, that's a bigger concern than MTAs that are not
   8-bit clean, in particular when maintaining a low-level program
   like our mail(1).
   Consequently, i think using 8bit is indeed better for our mail(1)
   than quoted-printable or base64.

 * Adding "Content-Type: text/plain; charset=utf-8" is required by
   RFC 2049 section 2 bullet point 4 (for the simplest kind of UTF-8
   encoded messages).

 * The Content-Disposition: header is defined in RFC 2183, clearly
   optional, and not useful in single-part messages.  Consequently,
   mail(1) should not write it.

So apart from writing the headers for us-ascii, i think you are
almost there.

Given that the charset cannot be inferred from the environment
and that setting it per-system or per-user in a configuration file
is also inadequate - it shouldn't be uncommon for users to sometimes
send US-ASCII and sometimes UTF-8 mail - i think that a new option
is indeed needed.

Regarding the naming of the option, compatibility with POSIX
  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/mailx.html
is paramount, which kills the tentative idea to use -u for "UTF-8"
because -u already means "user".

Compatibility with other mailx(1) implementations is also a
consideration.  See, for example,
  https://linux.die.net/man/1/mail
and -m is indeed among the very few options still available over there.
I would document it focussing on a "multibyte character encoding"
mnemonic.  The "mime" mnemonic feels far too broad because MIME can
be used for lots of other purposes besides specifying a character
encoding.

The -m option is also free here:
  https://man.freebsd.org/cgi/man.cgi?query=mail(1)
  https://man.netbsd.org/mail.1
  https://docs.oracle.com/cd/E88353_01/html/E37839/mailx-1.html
  https://www.ibm.com/docs/en/aix/7.3?topic=m-mail-command-1
None of those appears to support command line selection of the
character set for sending mail, so i don't see any immediate
logioc clashes either.

The -m option does clash with this one:
  https://www.sdaoden.eu/code-nail.html
But i think dismissing Steffen Daode Nurpmeso as a lunatic is obviously
the way to go.  Try to listen to that person and you will never get
anything done.

The mailx(1) documented on die.net appears to be the Heirloom one.
It does not have an option to select sending US-ASCII or UTF-8.
Instead, it has a "sendcharsets" configuration variable.  That's
clearly overengineering, but even when hardcoding the equivalent of

  sendcharsets=utf-8

which is also the default, that's nasty because it silently switches to
UTF-8 as soon as a non-ASCII character appears in the input.  I think
at least in interactive mode, explicit confirmation from the user would
be required to send UTF-8, instead writing dead.letter if the user
rejects the request, such that they can clean up the file and try again.

That would certainly be more complicated than requiring an option
up front, not only from the implementation perspective, but arguably
also from the user perspective.  So unless other developers think this
should be fully automatic with confirmation rather than controlled
by an option, i suggest staying with Walter's idea of using an option.


> Index: extern.h
> ===
> RCS file: /cvs/src/usr.bin/mail/extern.h,v
> retrieving revision 1.29
> diff -u -p -r1.29 extern.h
> --- extern.h  16 Sep 2018 02:38:57 -  1.29
> +++ extern.h  20 Sep 2023 10:44:41 -
> @@ -261,3 +261,4 @@ intwriteback(FILE *);
>  extern char *__progname;
>  extern char *tmpdir;
>  extern const struct cmd *com; /* command we are running */
> +extern char mime; /* Add MIME headers */

Likely not best mnemonic naming.

> Index: mail.1
> ===
> RCS file: /cvs/src/usr.bin/mail/mail.1,v
> retrieving revision 1.83
> diff -u -p 

Re: Send international text with mail(1) - proposal and patches

2023-09-20 Thread Ingo Schwarze
Hi Kirill,

Kirill Miazine wrote on Wed, Sep 20, 2023 at 12:52:52PM +0200:

> you may not even need -m, and instead inspect LC_CTYPE environment 
> variable and add appropriate headers for UTF-8. according to locale(1), 
> LC_CTYPE may be set to indicate UTF-8:
> 
> If the value of LC_CTYPE ends in ‘.UTF-8’, programs in the OpenBSD base 
> system ignore the beginning of it, treating for example zh_CN.UTF-8 
> exactly like en_US.UTF-8.

This is definitely very bad advice.  Whether the user uses an UTF-8
locale for their shell and terminal has nothing to do with whether
they want to be send UTF-8 encoded mail with MIME headers.
For example, i'm using LC_CTYPE=en_US.UTF-8 for my shells and
terminals most of the time, but i do not want the low-level mail(1)
MUA to suddenly start sending UTF-8 mail without being specifically
asked to.

I just checked - even though i'm using the higer-level mutt(1) MUA
most of the time and even though the shell i'm starting mutt(1) from
has LC_CTYPE=C.UTF-8 set on that particular machine, the last sixteen
mails i sent all contained the explicit header

  Content-Type: text/plain; charset=us-ascii

and intentionally so.  Yes, i do occasionally send UTF-8 mail on
purpose, mostly in highly technical messages that need to display
particular Unicode characters in addition to mentioning their
codepoints in the U+[XX] form, and rarely, sending UTF-8 happens
inadvertently because mutt(1) contains some weird autodetection logic -
but what you set your terminal to and what you use for sending mail
are clearly completely unrelated topics.

Yours,
  Ingo



Re: Send international text with mail(1) - proposal and patches

2023-09-19 Thread Ingo Schwarze
Hi Walter,

i did not look closely at the patch yet, and i did not dig for standards
documents, which one should almost certainly do before committing such
a patch unless one knows all the relevant standards by heart (which i
do not), so i'm not saying this must be done differently, but instead
i am merely asking questions.

1. Are you really sure that a header like
 MIME-Version: 1.0
   is not needed when you add Content-*: headers?

2. Are you really sure that a header like
 Content-Disposition: inline
   is not needed?

3. What is the reason for not simply hardcoding
 Content-Transfer-Encoding: 8bit
   when sending UTF-8 mail?
   Are there really still MTAs that choke on that in 2023?
   quoted-printable is definitely a PITA no matter the context,
   so in my book, if it can be avoided, avoiding it would be a plus.

4. What's the motivation for the -y flag taking an argument
   and not simply hardcoding "text/plain;charset=utf-8"?
   OpenBSD does not support any other charset and does not plan to
   change that in the future.
   I hope your next patch isn't going to be support for text/html.  =:-S

5. What's the motivation for supporting -y without -e
   and for supporting -e without -y ?

In general, we want as few options as possible and as little
configurabity as possible.  If there is a sane use case for something -
in this case, sending UTF-8 mail - *one* option is possibly warranted.
But adding more than one option would need a very robust justification,
and so would adding an option that takes an argument.

Note that mail(1) is not mail/swaks.  Its purpose is reading and
sending mail in a *simple* way, not low-level testing or protocol
debugging.

I'll postpone code review and testing, maybe you can simplify this
first?

Yours,
  Ingo



Re: man 9 intro - improvements [and learning for me]?

2023-09-18 Thread Ingo Schwarze
Hi Christoff,

of course you are free to work on whatever interests you, but if you are
looking for advice, i'd respectfully recommend that you try to work on
specifics rather than on generalities first, in particular when you
feel as if your experience in contributing isn't above average.
That applies even in the domain of documentation.

Christoff Humphries wrote on Mon, Sep 18, 2023 at 12:21:48PM +:

> I went searching for documentation about the kernel internals and was
> used to the intro(9) man page from NetBSD
> https://man.netbsd.org/intro.9 that had a lot more details. Would it 
> be a worthwhile project to attempt to do the same for OpenBSD?

Doing something like that may *sound* simple at first, but actually,
i can think of few documentation changes that would be harder to get
right and harder to get committed.

Unless i'm totally off track, getting that right requires

 1) a solid understanding of all areas of the kernel and
 2) a good understanding of what our kernel developers
actually need to know for their everyday work.

If you have that knowledge, it's *still* much easier to improve
individual pages that are lacking in quality than improving the top-level
overview.  Note that item 2 above tells you which of the pages are
probably most worthy of attention.

Besides, the NetBSD intro(9) page does not strike me as particularly
convincing.  *If* the lines in that page agree with the .Nd one-line
descriptions in the indivual pages, then it provides almost nothing
of value - but causes a maintenance burden because it needs to be
edited whenever any .Nd line changes.  If the lines mismatch, then
the .Nd lines in the indivifual pages can likely be improved.
I did not check which is the case - possibly both are.

We did have a few pages like that in the past, but most of those
got deleted because they provided little value.  For example,
compare these two:

  https://man.openbsd.org/OpenBSD-5.6/string.3
  https://man.openbsd.org/OpenBSD-current/string.3

A very small number still exists, perhaps most notably

  https://man.openbsd.org/crypto.3 and
  https://man.openbsd.org/ssl.3

The benefit of those is *not* that they exhaustively list everything -
indeed, apropos(1) is better at that job than a manually maintained
table of contents - but that they provide some high-level information
how the libraries as a whole are intended to be used that is hard to
figure out from individual pages.  Also, these pages do not duplicate
.Nd lines.

> I understand the annoyance of folks talking about what they intend or
> are going to do with no actual fruit, but wanted to check that the
> intro(9) is lacking and the information is not just stored somewhere
> else (other than "apropos -s 9 .").

Sorry, i lack the experience in kernel development that would be
required to judge whether any information could usefully be added
to intro(9).

Yours,
  Ingo


> I want to learn the internals of the OpenBSD kernel, too, and will do
> as mulander (and friends) did by learning a bit of code daily
> https://blog.tintagel.pl/2017/09/10/openbsd-daily-recap.html.
> 
> Seeking the learn and contribute, especially in the uvm, vmd/vmm, and
> possibly filesystem subsystems.



Re: Add ENOTDIR as possible error to unveil(2)

2023-09-16 Thread Ingo Schwarze
Hi,

Janne Johansson wrote on Sat, Sep 16, 2023 at 11:49:10AM +0200:

> In case someone wants the change in a diff format, those 5 seconds
> of work are available here:
> http://c66.it.su.se:8080/obsd/unveil.2.diff

How come jj@ forgot we want diffs inline?  

> +.It Bq Er ENOTDIR
> +.Fa path
> +points to a file that is not a directory.

That diff is certainly not OK.
It is misleading because a "path" argument pointing to a file
that is not a directory is actually valid.

The problem only occurs when you add a trailing slash to the name
of a file that is not a directory, like this:

   $ touch tmp.txt
   $ ls -l tmp.txt  
  -rw-r--r--  1 schwarze  schwarze  0 Sep 16 18:00 tmp.txt
   $ ls -l tmp.txt/
  ls: tmp.txt/: Not a directory
   $ ls -l tmp.txt/foobar
  ls: tmp.txt/foobar: Not a directory

Looking at sys_unveil in /sys/kern/vfs_syscalls.c, i suspect that
errno actually comes from a lower layer, likely namei(9), rather than
from unveil(2) itself.

In library manual pages, we occasionally have sentences like this:

 execv() may fail and set errno for any of the errors specified
 for the library function execve(2).

 The fgetln() function may also fail and set errno for any of the
 errors specified for the routines fflush(3), malloc(3), read(2),
 stat(2), or realloc(3).

But it doesn't look like we do that in system call manuals, and
besides, namei(9) appears to lack the RETURN VALUES section.

So, what is our policy with syscall ERRORS sections?
For standardized syscalls, POSIX probably needs to be taken
into account in addition to our implementation.
But for our very own syscalls?  Hm...

access(2), acct(2), chdir(2), chflags(2), chmod(2), chown(2), chroot(2),
execve(2), getfh(2), ktrace(2), link(2), mkdir(2), mkfifo(2), mknod(2),
mount(2), open(2), pathconf(2), quotactl(2), readlink(2), rename(2),
revoke(2), rmdir(2), stat(2), statfs(2), swapctl(2), symlink(2),
truncate(2), unlink(2), utimes(2) all list ENOTDIR, and i don't see
any right now taking a path argument that don't.

Do we want the following diff?

> I did not sort the existing return values in alphabetical order,
> there were two out of order in there, but that is a separate
> commit I guess.

NetBSD has a policy of sorting ERRORS entries alphabetically, we don't.

Yours,
  Ingo


Index: unveil.2
===
RCS file: /cvs/src/lib/libc/sys/unveil.2,v
retrieving revision 1.22
diff -u -r1.22 unveil.2
--- unveil.26 Sep 2021 08:03:08 -   1.22
+++ unveil.216 Sep 2023 16:38:24 -
@@ -158,6 +158,10 @@
 A directory in
 .Fa path
 did not exist.
+.It Bq Er ENOTDIR
+A component of the
+.Fa path
+prefix is not a directory.
 .It Bq Er EINVAL
 An invalid value of
 .Fa permissions



Re: bsd.port.mk.5: Ev for NO_ARCH

2023-09-06 Thread Ingo Schwarze
Hi Caspar,

Caspar Schutijser wrote on Wed, Sep 06, 2023 at 01:46:45PM +0200:

> The patch below marks up "NO_ARCH" consistently with the other
> variables. I noticed because on man.openbsd.org, NO_ARCH looks off
> compared to the others:
> https://man.openbsd.org/bsd.port.mk.5#MULTI_PACKAGES

Indeed, keeping markup correct and consistent is worthwhile.

Incorrect markup not only prevents consistent formatting across all
output devices but also harms search functionality.

For example, none of these currently works as expected:

 - typing ":t NO_ARCH" while in "man bsd.port.mk"
 - man -O tag=NO_ARCH bsd.port.mk
 - man -k Ev=NO_ARCH
 - man -k any=NO_ARCH
 - https://man.openbsd.org/bsd.port.mk#NO_ARCH

> Comments or OKs?

OK schwarze@
  Ingo

> Index: bsd.port.mk.5
> ===
> RCS file: /cvs/src/share/man/man5/bsd.port.mk.5,v
> retrieving revision 1.604
> diff -u -p -r1.604 bsd.port.mk.5
> --- bsd.port.mk.5 5 Sep 2023 23:45:53 -   1.604
> +++ bsd.port.mk.5 6 Sep 2023 11:44:22 -
> @@ -2447,7 +2447,7 @@ below.
>  Especially read the part about
>  .Ev ONLY_FOR_ARCHS
>  when some of the packages only exist for some architectures.
> -.It NO_ARCH
> +.It Ev NO_ARCH
>  Location for arch-independent packages.
>  Defaults to
>  .Sq no-arch .
> 



Re: sysctl: Fixing "second level name in XX. is invalid" on trailing period

2023-08-29 Thread Ingo Schwarze
Hi Neel,

Neel Chauhan wrote on Tue, Aug 29, 2023 at 02:36:57PM -0700:
> On 2023-08-29 14:23, Theo Buehler wrote:
>> Neel Chauhan:

>>> +   if (string[strlen(string) - 1] == '.')
>>> +   buf[strlen(string) - 1] = '\0';

>> Careful with out-of-bounds accesses. What if string is "" ? Probably
>> easiest to do "len = strlen(string);" and if (len > 0 && ...).

> Good catch!
> Does this look better:

Not much better, and my main complaint isn't even that storing the
return value of strlen(3) in an int variable is bad style.

Your patch is still incorrect in so far as with a command like

  sysctl 'vm.malloc_conf=S>.'

your code would happily and silently remove the period from
the end of the value whereas in

  sysctl vm.malloc_conf.=S

it would still complain in exactly the same way as it does now.

I'm not sure any of the sysctl string nodes that currently exist
provide a use case for a trailing period in the value, but i wouldn't
exclude such a use case arising in the future.  For example, malloc_conf
already supports some trailing non-letter characters in the value,
and kern.hostname already supports periods in the middle of the value,
right now.

Besides, i believe that in security-critical tools, parsing should be
strict, and trying to guess what people mean by invalid syntax is not
generally a good idea.

What's next?  Should "sysctl kern.." also do something, and if so, what?  

I don't see what benefit is provided in making "sysctl vm.malloc_conf."
work, but i do see how flooding someone with multiple pages of
output might be unwelcome if they wanted to type "sysctl kern.tty"
but somehow forget the "tty" part and hit enter after "sysctl kern." -
the trailing dot does indicate that they probably only wanted a
sub-node, or doesn't it?

Yours,
  Ingo


> Index: sbin/sysctl/sysctl.c
> ===
> RCS file: /cvs/src/sbin/sysctl/sysctl.c,v
> retrieving revision 1.259
> diff -u -p -u -r1.259 sysctl.c
> --- sbin/sysctl/sysctl.c  17 May 2023 22:12:51 -  1.259
> +++ sbin/sysctl/sysctl.c  29 Aug 2023 21:36:19 -
> @@ -377,6 +377,9 @@ parse(char *string, int flags)
> 
>   (void)strlcpy(buf, string, sizeof(buf));
>   bufp = buf;
> + len = strlen(string);
> + if (len > 0 && string[len - 1] == '.')
> + buf[len - 1] = '\0';
>   if ((cp = strchr(string, '=')) != NULL) {
>   *strchr(buf, '=') = '\0';
>   *cp++ = '\0';



Re: Removing extra space in "sysctl: top level name in . is invalid"

2023-08-29 Thread Ingo Schwarze
Hi Neel,

Neel Chauhan wrote on Tue, Aug 29, 2023 at 02:48:54PM -0700:

> I have noticed a bug. When running "sysctl ." or "sysctl kern." without 
> my other patch, I get an extra space between "name" and "in":
 
>> sysctl: top level name  in . is invalid

Technically, that message is correct.  The name being complained about
is a zero length string, and of course there are space characters
marking a word boundary before and after it, resulting in two adjacent
space characters around a zero-length word.  But i see how some users
might find that confusing.

If people want that improved, see below for a patch that actually
improves the diagnostic message, resulting in:

   $ sysctl kern.=test
  sysctl: empty second level name in kern.
   $ sysctl kern.  
  sysctl: empty second level name in kern.
   $ sysctl .=test
  sysctl: empty top level name in .
   $ sysctl .  
  sysctl: empty top level name in .
   $ sysctl =test  
  sysctl: empty top level name in 
   $ sysctl ''
  sysctl: empty top level name in 

Note this only happens for pathological names that are completely
empty or end in a dot, so i'm not sure it is very important.

Besides, there are other ways to provoke highly confusing error
messages by providing insane arguments.  Consider, for example:

   $ sysctl ' '  
  sysctl: top level name   in   is invalid
   $ sysctl 'kern.ostype'   
  kern.ostype=OpenBSD
   $ sysctl 'kern.os^Atype'
  sysctl: second level name ostype in kern.ostype is invalid
   $ sysctl 'kern.os^Gtype' 
  sysctl: second level name ostype in kern.ostype is invalid
   $ sysctl '^H'
  sysctl: top level name in is invalid

I'm not convinced accurately diagnosing every possible kind
of insane input is a priority.

Regarding the patch, note that the condition *bufp == NULL that
is being tested cannot actually happen because all callers test
that very condition right before calling findname(), and they
have to do that because they all want to call listall() in that
case.

Strangely, the property that *bufp cannot be NULL was already present
in McKusick's very first version of the program dated "5.1 93/03/30
23:48:06", about 3 months before the initial 4.4BSD release.  I have
no idea why he considered testing that useful.  Maybe he fell pray
to the confusing design of the strsep(3) API and actually wanted to
test for *name == '\0' but inadvertently wrote name == NULL and no
one ever noticed?

Maybe one argument in favor of my patch is that it makes the
source code of findname() easier to read in addition to making
the error message less confusing in your corner case.

Lightly tested so far.

Yours,
  Ingo


Index: sysctl.c
===
RCS file: /cvs/src/sbin/sysctl/sysctl.c,v
retrieving revision 1.259
diff -u -p -r1.259 sysctl.c
--- sysctl.c17 May 2023 22:12:51 -  1.259
+++ sysctl.c29 Aug 2023 23:31:09 -
@@ -2948,8 +2948,13 @@ findname(char *string, char *level, char
char *name;
int i;
 
-   if (namelist->list == 0 || (name = strsep(bufp, ".")) == NULL) {
+   if (namelist->list == 0) {
warnx("%s: incomplete specification", string);
+   return (-1);
+   }
+   name = strsep(bufp, ".");
+   if (*name == '\0') {
+   warnx("empty %s level name in %s", level, string);
return (-1);
}
for (i = 0; i < namelist->size; i++)



Re: Pull Request: Addition of String Data Type in /sys/sys/types.h

2023-06-18 Thread Ingo Schwarze
Hi,

Abderrahmane Ghellab wrote on Sun, Jun 18, 2023 at 08:16:21AM +0100:

> I am writing to discuss a recent pull request I submitted

Never submit pull requests, for no reason whatsoever, full stop.
There is no excuse for doing that (except in a few related projects
like OpenSSH-portable).

> regarding the /sys/sys/types.h file.

As a general rule, don't add non-standard stuff to standard
headers:  is included by dozens of POSIX headers,
so you are inviting clashes and incompatibility issues in lots of
existing standard-compliant code.

Additions to standard headers require very robust justification.

> typedef char * string;
> /*String Data Type: Better than writing char* every time*/

This is completely ridiculous.  It is pure obfuscation.  Everybody
understands what char * means in C, nobody would know what a
user-defined data type "string" might be.  Besides, you are hiding
the fact that it is a pointer type.  In general, never typedef
"*" or "struct" away, it is valueable see them whenever the type
is used.  You are also making it harder for people to read the C
language standard and the POSIX standard because those will contonie
talking about char *, including in the official declarations of
function prototypes.  Arguing for brevity in something that is five
printable characters long and proposing a replacement that is also
five printable characters long is hilarious.  You are not even saving a
space character, "char *s" and "string s" are the exactly same length,
no matter whether you count whitespace or not.

There is no chance whatsoever of this, or anything similar, being
accepted.

Your main mistake is that you try to solve a problem where no
problem exists at all.

You severely harm clarity, readability, standards conformance,
and application code compatibility for no reason whatsoever.

> steps required from my end.

It you want to contribute, start by getting a better understanding
of the C programming language, of secure and standard compliant
programming practices, and of OpenBSD development goals and priorities.

Yours,
  Ingo



Re: route.8: markup route flags to get tags

2023-04-05 Thread Ingo Schwarze
Hi Klemens and Jason,

Jason McIntyre wrote on Wed, Apr 05, 2023 at 07:12:05PM +0100:
> On Wed, Apr 05, 2023 at 01:07:18PM +, Klemens Nanni wrote:

>> To find out what a particular letter means, you must arrive at this list
>> why generic means because the first column has no markup.
>> 
>> As mdoc(7) 'constant symbol' the lookup is much quicker, e.g.
>>  man -Otag=h route
>> makes this the first line in the pager:
>>MRTF_MODIFIED Modified dynamically (by redirect).

> tag=h? is that a typo?
> 
> anyway, i personally am not convinced that anyone would do this.

I use "-O tag" occasionally, in particular with very long and complex
manual pages (and route(8) is complex and somewhat long), and i know
Klemens uses it often.  Since implementing -O tag, i occasionally also
got feedback from others who like it, though there are certainly many
people who do not use it.

On top of that, the feature is very useful when providing user support
for sending web links like

  https://man.openbsd.org/route.8#show

which points to the same place as

  $ man -O tag=show route

> i mean, it is easier just to type "man route" then page down.

That works, too, but can become tedious in long and complex pages.

> is the added complexity really worth it?

The complexity feels minimal to me, it does nothing more than formally
announcing that these flags serve a syntactic purpose, exactly like we
already do it for the RTF_* flags on the same lines.

While the .Dv macro is not a perfect fit, i agree with Klemens that
it is the best one available.

> we must have a ton of pages with lists explaining flags.

I see no need to hunt down similar cases in the tree, but when people
get annoyed with specific pages, i think it's fair to add tags
if it's logically sound and easily possible.

There is one very minor gotcha with short tags like these.
With Klemens' patch, try

  $ man -l -O tag=n route.8

which brings you to the -n command line option; then hit the "t" key
once: it brings you to the RTF_CONNECTED ("n") flag...

So short tags tend to clash with unrelated stuff marked up with
different macros, in this case .Fl and .Dv.

But that's not a game-breaker.  In a long and complex page, ":t nt" 
may still be simpler than searching visually.

I refrained from making the tagging feature more complex and
supporting stuff like "-O tag=Dv=n" vs. "-O tag=Fl=n", and i don't
think that was a mistake.  Such clashes are not *that* common, so KISS.

> like
>   man -Otag=D ps
> that doesn;t work either.

Probably wouldn't hurt either to make that work, but seems neither
urgent nor required to make Klemens' patch useful.

>> Route flags that now produce an already existing tag don't conflict, i.e.
>> ":tT" first shows
>>  -T rtable
>>  Select an alternate routing table to modify or query.  The
>> and then on second "t"
>>TRTF_MPLS MPLS route.

Oops, right, you found that too.

>> OK?

FWIW, i'm OK with this patch.

>> Index: route.8
>> ===
>> RCS file: /cvs/src/sbin/route/route.8,v
>> retrieving revision 1.117
>> diff -u -p -r1.117 route.8
>> --- route.8  18 Mar 2023 11:44:53 -  1.117
>> +++ route.8  5 Apr 2023 12:56:12 -
>> @@ -401,27 +401,27 @@ Within the output of
>>  the "Flags" column indicates what flags are set on the route.
>>  The mapping between letters and flags is:
>>  .Bl -column "1" "RTF_BLACKHOLE" "Protocol specific routing flag #1."
>> -.It 1 Ta Dv RTF_PROTO1 Ta "Protocol specific routing flag #1."
>> -.It 2 Ta Dv RTF_PROTO2 Ta "Protocol specific routing flag #2."
>> -.It 3 Ta Dv RTF_PROTO3 Ta "Protocol specific routing flag #3."
>> -.It B Ta Dv RTF_BLACKHOLE Ta "Just discard packets."
>> -.It b Ta Dv RTF_BROADCAST Ta "Correspond to a local broadcast address."
>> -.It C Ta Dv RTF_CLONING Ta "Generate new routes on use."
>> -.It c Ta Dv RTF_CLONED Ta "Cloned routes (generated from RTF_CLONING)."
>> -.It D Ta Dv RTF_DYNAMIC Ta "Created dynamically (by redirect)."
>> -.It G Ta Dv RTF_GATEWAY Ta "Dest requires forwarding by intermediary."
>> -.It H Ta Dv RTF_HOST Ta "Host entry (net otherwise)."
>> -.It h Ta Dv RTF_CACHED Ta "Referenced by gateway route."
>> -.It L Ta Dv RTF_LLINFO Ta "Valid protocol to link address translation."
>> -.It l Ta Dv RTF_LOCAL Ta "Correspond to a local address."
>> -.It M Ta Dv RTF_MODIFIED Ta "Modified dynamically (by redirect)."
>> -.It m Ta Dv RTF_MULTICAST Ta "Correspond to a multicast address."
>> -.It n Ta Dv RTF_CONNECTED Ta "Interface route."
>> -.It P Ta Dv RTF_MPATH Ta "Multipath route."
>> -.It R Ta Dv RTF_REJECT Ta "Host or net unreachable."
>> -.It S Ta Dv RTF_STATIC Ta "Manually added."
>> -.It T Ta Dv RTF_MPLS Ta "MPLS route."
>> -.It U Ta Dv RTF_UP Ta "Route usable."
>> +.It Dv 1 Ta Dv RTF_PROTO1 Ta "Protocol specific routing flag #1."
>> +.It Dv 2 Ta Dv RTF_PROTO2 Ta "Protocol specific routing flag #2."
>> +.It Dv 3 Ta Dv RTF_PROTO3 Ta "Protocol 

Re: mem.4: be more accurate about securelevel

2023-01-20 Thread Ingo Schwarze
Hi Stuart,

Stuart Henderson wrote on Fri, Jan 20, 2023 at 08:50:48AM +:
> On 2023/01/18 12:46, Theo de Raadt wrote:

>> But you should not start a sentence with also.
>> Also you should not start a sentence with but.
>> 
>> Not the best english.  jmc can weight in perhaps.

>> Jan Klemkow  wrote:

>>>  .Pp
>>>  Even with sufficient file system permissions,
>>>  these devices can only be opened when the
>>> -.Xr securelevel 7
>>> -is insecure or when the
>>>  .Va kern.allowkmem
>>>  .Xr sysctl 2
>>>  variable is set.
>>> +Also the
>>> +.Xr securelevel 7
>>> +insecure is needed, to open the device writable.

> This is all that's needed isn't it?
> 
>  Even with sufficient file system permissions,
>  these devices can only be opened when the
>  .Xr securelevel 7
> -is insecure or when the
> -is insecure and the
>  .Va kern.allowkmem
>  .Xr sysctl 2
>  variable is set.

I believe that is not what we want to say:
deraadt@ argues that 
 - nobody should run with insecure securelevel,
   not even for offline debugging
 - and it is not needed for read access to /dev/mem
(The discussion has in part drifted off list.)

If we want a complete description (including the strongly
discouraged way to get write access), the following floating
diff is the best i'm aware of:

 Even with sufficient file system permissions,
 these devices can only be opened when the
-.Xr securelevel 7
-is insecure or when the
 .Va kern.allowkmem
 .Xr sysctl 2
 variable is set.
+Write access additionally requires an insecure
+.Xr securelevel 7 .

If we want to discourage this even more, we could say something
like this:

 .Pp
 Even with sufficient file system permissions,
-these devices can only be opened when the
+these devices can only be opened for reading and only when the
-.Xr securelevel 7
-is insecure or when the
 .Va kern.allowkmem
 .Xr sysctl 2
 variable is set.
 .Sh FILES

That would make write behaviour undefined, such that it could be
removed with no further documentation fuss once write access is
indeed removed in the future.

I would be fine with either direction.

Yours,
  Ingo



Re: Inconsistent isdigit(3) man page

2023-01-20 Thread Ingo Schwarze
Hi Todd, hi Bob,

Todd C. Miller wrote on Fri, Jan 20, 2023 at 09:59:20AM -0700:
> On Fri, 20 Jan 2023 09:32:38 -0700, Bob Beck wrote:

>> So isdigit(3) says in the first paragraph that
>> 'The complete list of decimal digits is 0 and 1-9, in any locale.'

The intended meaning of this sentence was "on OpenBSD".
We often have the DESCRIPTION state OpenBSD behaviour,
then relegate portability information to other sections.

I see how this practice could cause confusion in the case at hand.

>> Later on it says:
>>
>> 'On systems supporting non-ASCII single-byte character encodings,
>> different c arguments may correspond to the digits, and the results of
>> isdigit() may depend on the LC_CTYPE locale(1).'
>>
>> Argubly worring about non ASCII single byte character encodings
>> doesn't make sense for an OpenBSD man page, but setting that aside for
>> a minute it's the second part that is of concern.. "It may depend on
>> the LC_CTYPE locale()" - Ok, well does it or doesn't it? 

> On OpenBSD isdigit() never uses the locale, but you already knew
> that.
 
>> Various spec docs seem all over the place on this, so I am also
>> paging Dr. Posix in this email... Hi Philip! :)  Is isdigit()
>> safe from being screwed up by locale or not?

> POSIX says that isdigit() may be influenced by the locale.  Since
> LC_CTYPE defines the different character classes, of which "digit"
> is one, I think that part of the manual is correct as-is.
> 
> However, perhaps we should make isdigit(3) match isalpha(3) like
> so:

I agree with Todd.

OK schwarze@
  Ingo


> Index: lib/libc/gen/isdigit.3
> ===
> RCS file: /cvs/src/lib/libc/gen/isdigit.3,v
> retrieving revision 1.13
> diff -u -p -u -r1.13 isdigit.3
> --- lib/libc/gen/isdigit.311 Sep 2022 06:38:10 -  1.13
> +++ lib/libc/gen/isdigit.320 Jan 2023 16:58:20 -
> @@ -51,7 +51,12 @@ The
>  and
>  .Fn isdigit_l
>  functions test for any decimal-digit character.
> -The complete list of decimal digits is 0 and 1\(en9, in any locale.
> +In the C locale, the complete list of decimal digits is 0 and 1\(en9.
> +.Ox
> +always uses the C locale for these functions,
> +ignoring the global locale, the thread-specific locale, and the
> +.Fa locale
> +argument.
>  .Sh RETURN VALUES
>  These functions return zero if the character tests false or
>  non-zero if the character tests true.



Re: libevent manuals

2023-01-18 Thread Ingo Schwarze
Hi Ted,

Ted Bullock wrote on Mon, Jan 16, 2023 at 12:56:06PM -0700:

> The impetus is that the event(3) manual page isn't all that good for
> documenting how to use the library and it is missing functions that are
> included in the API and available in the shared library.

That seems true, and those are good reasons to improve the manual pages.

> Upstream of course has moved on to version 2.x, as far as I can tell
> there is no more maintenance being done on the 1.4 version

According to
  https://github.com/libevent/libevent/tree/patches-1.4
that seems accurate to me: it appears the 1.4 branch was last touched
in the git repo eight years ago, and it is marked "stale",
along with the 2.0 and 2.1 branches.

> that OpenBSD ships except the work that is done internally in the OS.

Given that our event.3 says
  .\" Copyright (c) 2000 Artur Grabowski 
and that the libevent git repo contains a Jurassic version
of our event.3 in the 1.4 branch and does no longer appear
to contain *any* kind of documentation in the master branch -
in particular, there is no event.3 in the master branch any longer -
i think maintaining the libevent manual pages ourselves is perfectly
fine.  I see no danger that any documentation might need to be merged
from the libevent git: it appears the libevent project regards
documentation as a useless distraction at best...  :-o

> Anyway, here is what I came up with for a few functions after reading
> the source tree. I also went back through historical releases to see
> how the API evolved over time.

I think this is likely a worthwhile project, but still needs a lot
of work before it can be considered for commit.

> I'm planning to finish off documenting the rest of the library, if
> folks have feedback, I'm interested.

Given how much feedback i have on your first file, i'm not yet reading
your second and third file right now.

We do development in steps, and the tree needs to build and be better
after each commit than before that commit.  So to be able to be committed,
your patch for the first file should also delete the now redundant
information from event.3.  Regard it as splitting one well-defined
part out of the excessively large event.3 and adding lots of missing
information while splitting it out.

> event_init.3
> 

At this point, one line is missing:

  .\" $OpenBSD$

The command `mandoc -T lint` tells you:

  mandoc: event_init.3: STYLE: RCS id missing: (OpenBSD)

> .\" Copyright (c) 2023 Ted Bullock 
> .\"
> .\" Permission to use, copy, modify, and distribute this software for any
> .\" purpose with or without fee is hereby granted, provided that the above
> .\" copyright notice and this permission notice appear in all copies.
> .\"
> .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> .\"
> .Dd $Mdocdate: January 9 2023 $
> .Dt EVENT_INIT 3
> .Os
> .Sh NAME
> .Nm event_base_new ,
> .Nm event_init

This is not nice.
Unless there is a good reason to name the page after a function that
is not the first function in the NAME section, put the function
corresponding to the file name first in NAME and in all other sections.

> .Nd initializes an
> .Vt event_base
> instance

While nesting markup inside .Nd is technically possible,
it is fragile, looks ugly, and we certainly don't do it in OpenBSD.

> .Sh SYNOPSIS
> .In event.h
> .Ft "struct event_base *"
> .Fn event_base_new void
> .Ft "struct event_base *"
> .Fn event_init void
> .Sh DESCRIPTION
> The functions
> .Fn event_base_new
> and
> .Fn event_init
> initialize the
> .Lb libevent

We don't normally use the .Lb macro in OpenBSD.
Maybe just talk about "the event library" with no markup?
The existing event.3 talks about

The
.Nm event
API ...

which is not wrong, but i'm not convinced the .Nm markup provides much
value here.

> and need to be called prior to scheduling an event or
> starting the event-loop. The two functions are similar, although global
> variables are set when calling
> .Fn event_init
> to help simplify the API for programs requiring only one event-loop.

This is not good.  Don't start comparing functions without precisely
saying first what each function actually does, in particular not in the
first sentence of a manual page.

Instead, the first sentence should only state the general purpose of
the functions documented in the page.  In this page, it might not be
needed at all, you could probably start with event_base_new(3) right
away.

After the introductory sentence(s), if any, start with one function and
document it thoroughly.  

Re: [RFC resend v1 2/2] Use arc4random_range() instead of arc4random_uniform() when appropriate

2022-12-31 Thread Ingo Schwarze
Hi Alejandro,

Alejandro Colomar wrote on Sun, Jan 01, 2023 at 01:08:17AM +0100:
> On 12/31/22 20:08, Alejandro Colomar wrote:

>> This makes the code much more readable and self-documented.  While doing
>> this, I noticed a few bugs, and other cases which may be bugs or not.
>> Switching to this specialized API makes it easier to spot such bugs, but
>> since I'm not familiar with the code, I kept some bugs unfixed.  The
>> most obvious ones (although I may be wrong) I fixed them.  And in some
>> cases where it was very unclear, I didn't touch the old *_uniform() code.
>> 
>> Below are the cases where I changed the behavior (I considered it a bug):
>> 
>> *  usr.bin/ssh/auth.c:
>> 
>> -  *cp = hashchars[arc4random_uniform(sizeof(hashchars) - 1)];
>> +  *cp = hashchars[arc4random_range(0, sizeof(hashchars) - 1)];

> Reconsidering, this one is probably better just as 
> arc4random_uniform(sizeof(hashchars)).

That seems to introduce exactly the same bug as your first try.
I already explained last year that the code is correct as-is.
We don't want NUL bytes in the password hash, hence the - 1.

Also, please avoid using MIME when you post to OpenBSD lists, with
the only exception of posting tarballs of new ports to ports@, in
which case attachments are OK.

Yours,
  Ingo



Re: [v3] timeout.9: document missing interfaces, miscellaneous rewrites

2022-12-31 Thread Ingo Schwarze
Hi Scott,

in the NAME section, please put timeout_add_nsec after timeout_add_usec
to agree with the order in the SYNOPSIS.

In any case, please go ahead.  It appears jmc@ is developing a sore
elbow from more than a year of medicine ball ping pong.  ;-)

The following are merely suggestions / observations / questions,
not conditions.

Scott Cheloha wrote on Sat, Dec 31, 2022 at 11:22:22AM -0500:

> mvs@ is nudging me to realign the timeout.9 page with the state of the
> kernel.
> 
> Here is my rewrite (again).
> 
> There are some bits that I want to rework.  The opening paragraph is
> especially clickety-clackety.

The opening paragraph seems fine to me.  The second paragraph might
be considered a bit awkward.  If you rename the struct timeout
argument from *to to *timeout in all prototypes in the SYNOPSIS
and everywhere in the text, you might be able to join the second and
third paragraphs, for example as follows:

All state is encapsulated in a
.Vt struct timeout
allocated by the caller.
The
.Fn timeout_set
function initializes the
.Fa timeout
before it can be passed to any of the other functions.
When the timeout ...

That way, you get rid of the word "caller-allocated" and the
parenthetic remark, and some of the later text may also simplify
all by itself in a natural way.

> Still, I think this is an improvement over what's in-tree.  And the
> technical content should be complete and accurate, which is the most
> important thing.
> 
> ok?
> 
> Index: timeout.9
> ===
> RCS file: /cvs/src/share/man/man9/timeout.9,v
> retrieving revision 1.55
> diff -u -p -r1.55 timeout.9
> --- timeout.9 22 Jun 2022 14:10:49 -  1.55
> +++ timeout.9 31 Dec 2022 16:19:27 -
[...]
> @@ -44,281 +46,370 @@
[...]
> -Once initialized, the
> -.Fa to
> -structure can be used repeatedly in
> -.Fn timeout_add
> -and
> -.Fn timeout_del
> -and does not need to be reinitialized unless
> -the function called and/or its argument must change.

Are you deleting this information on purpose?

[...]
> -timeout in at least

Are you deleting the words "at least" on purpose, or should they be
reinserted into this sentence:

  KCLOCK_NONE timeouts may be scheduled with the function timeout_add(),
  which schedules the given timeout to execute after nticks hardclock(9)
  ticks have elapsed.  

[...]
> +The
>  .Fn timeout_del_barrier
> -is like
> -.Fn timeout_del
> -but it will wait until any current execution of the timeout has completed.
> +function is similar to
> +.Fn timeout_del ,
> +except that it may block until any current execution of the given timeout
> +has completed.

This appears to change the meaning.

The old text provides a guarantee that timeout_del_barrier(9)
does not return before the timeout completes, if it is currently
running.  The new wording no longer provides that guarantee.
It that intentional?

Otherwise, s/may block/blocks/ ?  Or, if you think it should be
more explicit, maybe something like:

  except that, if the timeout is currently executing,
  .Fn timeout_del
  blocks until execution of the timeout has completed.

[...]
> -Otherwise, the system will deadlock.
> +Otherwise,
> +the system will deadlock.

No need to change that.  The rule "break the line after a comma"
is specific to the Linux man pages project, and i consider it excessive.
We only follow "new sentence, new line".

Yours,
  Ingo



Re: [RFC resend v1 2/2] Use arc4random_range() instead of arc4random_uniform() when appropriate

2022-12-31 Thread Ingo Schwarze
Hi Alejandro,

Alejandro Colomar wrote on Sat, Dec 31, 2022 at 08:08:14PM +0100:

> This makes the code much more readable and self-documented.

I object.  I is a needless detour that invites confusion and bugs.
In C code, it is customary to deal with half-open intervals,
and closed intervals are rare.

For example, array indices run from 0 to sizeof(array)-1,
sizeof(array) points to the storage location beyond the last element,
and C programmers are used to that.  So the was arc4random_uniform(3)
works feels familiar to C programmers, whereas your proposal
of arc4random_range(3) is idiosyncratic and provokes bugs.

> While doing this, I noticed a few bugs,

I doubt it.  I think you fell into your own trap.

> *  usr.bin/ssh/auth.c:
> 
>-  *cp = hashchars[arc4random_uniform(sizeof(hashchars) - 1)];
>+  *cp = hashchars[arc4random_range(0, sizeof(hashchars) - 1)];

There is no bug here.  The code goes:

const char hashchars[] = "./ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz0123456789"; /* from bcrypt.c */
char *cp;
/* ... */
for (cp = fake.pw_passwd + 7; *cp != '\0'; cp++)
*cp = hashchars[arc4random_uniform(sizeof(hashchars) - 1)];

sizeof(hashchars) is the size of the char array, i.e. the string length
plus one (plus one for the terminating NUL byte).

So sizeof(hashchars)-1 is the number of useful characters in the string,
which is the same as the strlen(3).

So arc4random_uniform() returns a number in the half-open
interval [0, strlen).  The minimum index is 0 -> '.',
the maximum index is strlen-1 -> '9'.

This is all perfectly fine.

Your patch introduces a bug.  The terminaing NUL character may now
be copied into fake.pw_passwd.

This is exactly one of the reasons why i objected to your
arc4random_range() proposal: it will cause confusion and bugs.

I think that the first hunk of your patch introduces rather than
fixes a bug makes your patch unworthy of review.  It should be
summarily rejected.

Yours,
  Ingo



Re: [RFC v1 1/2] Add arc4random_range(min, max)

2022-12-31 Thread Ingo Schwarze
Hi Alexandro,

i fail to see the point.  We do not usually add extra functions
if the same effect can be be attained with one-liners.  There is
significant value in keeping the API as small as possible, it
makes the API easier to learn and it makes programming mistakes
less likely.  On top of that, code using the one-liner is usually
easier to read than code involving yet another wrapper.

On top of that, consider that arc4random_uniform(upper_bound)
produces a number in [0, upper_bound), *not* in [0, upper_bound].
Consistency of an API is similarly important as smallness.
So introducing a variant that produces [min, max] rather than
[min, max) looks like a non-starter to me.  You are setting a
giant trap provoking off-by-one errors!

Some additional comments follow in-line, even though i doubt they
matter much: i think that nothing should be added, and you did not
provide any rationale why this might be needed.


Alejandro Colomar wrote on Sat, Dec 31, 2022 at 07:18:55PM +0100:

> diff --git a/include/stdlib.h b/include/stdlib.h
> index ab8a2ae90c3..16b7dc43afc 100644
> --- a/include/stdlib.h
> +++ b/include/stdlib.h
> @@ -313,6 +313,7 @@ u_quad_t strtouq(const char *__restrict, char 
> **__restrict, int);
>  
>  uint32_t arc4random(void);
>  uint32_t arc4random_uniform(uint32_t);
> +uint32_t arc4random_uniform(uint32_t, uint32_t);

Uh oh.  That looks doubleplusungood.

> --- a/lib/libc/crypt/arc4random.3
> +++ b/lib/libc/crypt/arc4random.3
> @@ -46,6 +46,8 @@
[...]
> @@ -95,16 +97,47 @@
>  In the worst case, this function may consume multiple iterations
>  to ensure uniformity; see the source code to understand the problem
>  and solution.
> +.Pp
> +.Fn arc4random_range
> +is similar to
> +.Fn arc4random_uniform ,

I very strongly object to this description.  I is an outright lie!
arc4random_uniform -> upper_bound)  *exclusive*
arc4random_range   -> upper_bound]  *inclusive*

> +and will return a single 32-bit value,

The word "will" is unwelcome in manual pages, unless you have
some exceptional justification why it is needed in a particular case.

> +uniformly distributed,
> +within the inclusive range
> +.Pf [ Fa min ,
> +.Fa max ].
> +If the arguments are reversed,
> +that is,
> +if
> +.Fa max
> +<
> +.Fa min ,
> +it will return a single 32-bit value,
> +uniformly distributed,
> +outside of the exclusive range
> +.Pf ( Fa max ,
> +.Fa min ).

Is this ever needed in practice?
It sounds complicated and confusing.
We should not add API specifications just because we can,
espially not in libc.

>  .Sh RETURN VALUES
>  These functions are always successful, and no return value is
>  reserved to indicate an error.
> +.Sh CAVEATS
> +.Fn arc4random_range
> +doesn't produce correct output when
> +.Fa max
> +==
> +.Fa min
> +- 1.

Looks like a landmark symptom of bad API design: setting a trap for
the unwary with no good reason.  We should not provide bad API
functions and then document their shortcomings.

>  .Sh SEE ALSO
>  .Xr rand 3 ,
>  .Xr rand48 3 ,
>  .Xr random 3
>  .Sh HISTORY
>  These functions first appeared in
> -.Ox 2.1 .
> +.Ox 2.1 ,
> +except
> +.Fn arc4random_range ,
> +which appeared in
> +.Ox XXX .
>  .Pp
>  The original version of this random number generator used the
>  RC4 (also known as ARC4) algorithm.
> diff --git a/lib/libc/crypt/arc4random_uniform.c 
> b/lib/libc/crypt/arc4random_uniform.c
> index a18b5b12381..40957910b96 100644
> --- a/lib/libc/crypt/arc4random_uniform.c
> +++ b/lib/libc/crypt/arc4random_uniform.c
> @@ -2,6 +2,7 @@
>  
>  /*
>   * Copyright (c) 2008, Damien Miller 
> + * Copyright (c) 2022, Alejandro Colomar 

Adding a Copyright notice is inappropriate in most parts of OpenBSD
except for the main authors of the file.  Merely adding something
that is Copyright-worthy is not enough.  Besides, i suspect your
patch reaches the Copyright threshold only due to the comment -
the function itself is likely trivial even according to the low
threshold of Copyright law.

Yours,
  Ingo

>   *
>   * Permission to use, copy, modify, and distribute this software for any
>   * purpose with or without fee is hereby granted, provided that the above
> @@ -55,3 +56,14 @@ arc4random_uniform(uint32_t upper_bound)
>   return r % upper_bound;
>  }
>  DEF_WEAK(arc4random_uniform);
> +
> +/*
> + * Calculate a uniformly-distributed random number in the range [min, max],
> + * avoiding bias.
> + */
> +uint32_t
> +arc4random_range(uint32_t min, uint32_t max)
> +{
> + return arc4random_uniform(max - min + 1) + min;
> +}
> +DEF_WEAK(arc4random_range);
[...]



Re: getdelim.3: Add a missing Vt

2022-12-20 Thread Ingo Schwarze
Hi Josiah,

committed, thanks!
  Ingo

Josiah Frentsos wrote on Tue, Dec 20, 2022 at 12:49:06PM -0500:

> Index: getdelim.3
> ===
> RCS file: /cvs/src/lib/libc/stdio/getdelim.3,v
> retrieving revision 1.6
> diff -u -p -r1.6 getdelim.3
> --- getdelim.317 Oct 2017 22:47:58 -  1.6
> +++ getdelim.320 Dec 2022 17:45:54 -
> @@ -57,7 +57,8 @@ The buffer is
>  and includes the delimiter.
>  The
>  .Fa delimiter
> -character must be representable as an unsigned char.
> +character must be representable as an
> +.Vt unsigned char .
>  .Pp
>  If
>  .Fa *n



Re: args vs arg ... in manuals

2022-12-20 Thread Ingo Schwarze
Hi Klemens,

Jason McIntyre wrote on Tue, Dec 20, 2022 at 10:35:19AM +:
> On Tue, Dec 20, 2022 at 09:36:39AM +, Klemens Nanni wrote:

>> Both styles are used, but I argue that the former fails to distinguish
>> between
>>  $ program 'args in one shell word'
>> and
>>  $ program one arg per shell word
>> 
>> It's a minor thing, imho, but perhaps we can decide for one and stick to
>> it throughout the tree?
>> 
>> Triple dots also make it immediately clear that many arguments may
>> follow, no matter what the "arg"ument is named.
>> 
>> Here's one examplatory diff for timeout(1).
>> 
>> Feedback?

> i prefer "arg ..." too.

Seconded.  Or rather, what, fourthed?

> by comparison, it would probably look weird to
> write "files" instead of "file ...". indeed the Ar macro without args
> outputs exactly that.
> 
> still it would be good to get an idea of how many pages would be
> affected by this.

  $ man -k Ar=args

gives me a relatively short list, and the only clear offenders appear
to be: doas(1), openssl(1), timeout(1), route(8), and possibly boot(8)
on a handful of platforms.

Strangely, in build-debug-info(1), cdio(1), csplit(1), tput(1),
update-plist(1), ddb(4) we have "..." after plural "args" and i'm
unsure what that is even intended to mean in some of the cases.

I guess i ought to fix mdoc(7) myself.

> if we have only a few pages using "args" it would seem to make sense.

That seems to be the case.  I think kn@ is welcome to go ahead and fix
them - not in a completely mechanical way, of course, the text should
still be clear and read well afterwards.

> if it's 50/50 we might want to look for more complex

Certainly not 50/50:

  $ man -k Ar=... | wc -l

gives me well over three hundred pages.

> examples and figure out whether all the change will be worth it.
> 
> also some things will use argument vs arg. for example ssh(1). do you
> propose to change those?

  $ man -k Ar=argument

gives me fifty results.

I think that is a completely separate question and need not be handled
together with the "args vs. arg ..." issue.  FWIW, i personally like
.Ar argument ... , at least in many cases.  Avoiding unnessecary
abbreviations often makes text easier and more pleasant to read.

There are also a handful of "Ar=arguments" that might deserve
fixing, in particular su(1), tmux(1), bgpd.conf(5), bgpctl(8),
and rcctl(8).

And time(1) and btrace(8) might need

  .Ar arguments -> .Ar argument Ns s

in the vein mentioned by millert@.

Yours,
  Ingo


>> Index: timeout.1
>> ===
>> RCS file: /cvs/src/usr.bin/timeout/timeout.1,v
>> retrieving revision 1.3
>> diff -u -p -r1.3 timeout.1
>> --- timeout.14 Sep 2021 11:58:31 -   1.3
>> +++ timeout.129 Oct 2022 20:53:05 -
>> @@ -41,14 +41,14 @@
>>  .Op Fl -preserve-status
>>  .Ar duration
>>  .Ar command
>> -.Op Ar args
>> +.Op Ar arg ...
>>  .Sh DESCRIPTION
>>  The
>>  .Nm
>>  utility executes
>>  .Ar command ,
>>  with any
>> -.Ar args ,
>> +.Ar arg ... ,
>>  and kills it if it is still running after the
>>  specified
>>  .Ar duration .
>> Index: timeout.c
>> ===
>> RCS file: /cvs/src/usr.bin/timeout/timeout.c,v
>> retrieving revision 1.21
>> diff -u -p -r1.21 timeout.c
>> --- timeout.c2 Jul 2022 19:00:35 -   1.21
>> +++ timeout.c29 Oct 2022 21:08:40 -
>> @@ -57,7 +57,7 @@ usage(void)
>>  fprintf(stderr,
>>  "usage: timeout [-k time] [-s sig] [--foreground]"
>>  " [--preserve-status] duration\n"
>> -"   command [args]\n");
>> +"   command [arg ...]\n");
>>  
>>  exit(1);
>>  }



Re: pause.3: misc cleanup

2022-12-10 Thread Ingo Schwarze
Hi Scott,

Scott Cheloha wrote on Sat, Dec 10, 2022 at 09:25:48AM -0600:

> Sure, I will keep the ERRORS section.
> I think the current phrasing in ERRORS is odd, though.
> "may set the global variable..." is what we normally say here, but it
> isn't a "may" in this case, it's an "always".

I wouldn't go as far as saying "'may set the global variable...' is
what we normally say.  It occurs in a very small number of section
3 manuals.  It might occasionally make sense in section 3 because
some library functions might fail both due to syscall failures and for
other reasons not related to syscalls, in which case they may sometimes
set errno and sometimes not - even though some library functions also
set errno themselves, not relying on failing syscalls to do that.

Even in section 3, i see less than ten instances:
 - __thrsleep(2)
 - clock_settime(2)
 - setlogin(2)
 - setrlimit(2)
 - read(2)
 - recv(2)
 - send(2)
 - truncate(2)
 - write(2)
And all these only occur after the usual wording "will fail if".
So it doesn't mean that any syscall can fail without setting errno;
this wording merely indicates that for specific functions, additional
errno values may occur instead of the ones listed right above.

That said, if something is guaranteed to set errno, i clearly prefer
saying so.  Let's try to avoid needless ambiguity.

> Check if my tweak in the attached patch feels right.

I dislike the wording "return an error" for setting errno, even in
the existing pages, in particular for a function that has an "int"
return value.  I see a slight risk of confusing people.

Also, for this function, setting errno(2) does not really indicate
any error, or does it?  The function is a bit unusual in so far as
it always returns the same value, never fails, always sets errno,
and always to the same value, so the usual wordings don't fit too well.

I would prefer something like this:

.Fn pause
blocks the calling thread until it receives an unmasked signal
and then returns.
.Sh RETURN VALUES
.Fn pause
always returns \-1.
.Sh ERRORS
.Fn pause
always sets
.Xr errno 2
to the following value:
.Bl -tag -width Er
.It Bq Er EINTR
The call was interrupted by a signal.
.El

Feel free to use part or all of my ideas,
or commit your patch commit as-is.

OK schwarze@ either way.

Yours,
  Ingo


> Index: pause.3
> ===
> RCS file: /cvs/src/lib/libc/gen/pause.3,v
> retrieving revision 1.16
> diff -u -p -r1.16 pause.3
> --- pause.3   9 Nov 2022 06:48:29 -   1.16
> +++ pause.3   10 Dec 2022 15:23:47 -
> @@ -32,7 +32,7 @@
>  .Os
>  .Sh NAME
>  .Nm pause
> -.Nd stop until signal
> +.Nd wait for a signal
>  .Sh SYNOPSIS
>  .In unistd.h
>  .Ft int
> @@ -40,40 +40,31 @@
>  .Sh DESCRIPTION
>  .Bf -symbolic
>  .Fn pause
> -is made obsolete by
> +is obsoleted by
>  .Xr sigsuspend 2 .
>  .Ef
>  .Pp
> -The
>  .Fn pause
> -function forces a process to pause until a signal is received from either the
> -.Xr kill 2
> -function or an interval timer
> -(see
> -.Xr setitimer 2 ) .
> -.Pp
> -Upon termination of a signal handler started during a
> -.Fn pause ,
> -the
> -.Fn pause
> -call will return.
> +blocks the calling thread until it receives an unmasked signal.
>  .Sh RETURN VALUES
> -Always returns \-1.
> -.Sh ERRORS
> -The
> +On receipt of a signal,
>  .Fn pause
> -function may set the global variable
> +returns \-1 and the global variable
>  .Va errno
> -to the following error:
> +is set to indicate the error.
> +.Sh ERRORS
> +.Fn pause
> +always returns the following error:
>  .Bl -tag -width Er
>  .It Bq Er EINTR
> -The call was interrupted.
> +The call was interrupted by a signal.
>  .El
>  .Sh SEE ALSO
>  .Xr kill 2 ,
> -.Xr select 2 ,
>  .Xr setitimer 2 ,
> -.Xr sigsuspend 2
> +.Xr sigprocmask 2 ,
> +.Xr sigsuspend 2 ,
> +.Xr signal 3
>  .Sh HISTORY
>  A
>  .Fn pause



Re: pause.3: misc cleanup

2022-11-10 Thread Ingo Schwarze
Hi Scott,

thanks for repeatedly working on time-related library documentation.  :-)

Unless noted otherwise, i agree with Todd's comments.

Todd C. Miller wrote on Wed, Nov 09, 2022 at 10:31:15AM -0700:
> On Wed, 09 Nov 2022 16:47:22 +, Scott Cheloha wrote:

>> RETURN VALUES
>>
>> - Pull ERRORS into this section.  No need to put the one error in a
>>   .Bl/.El, we can just mention it inline.

> OK

I somewhat strongly object to this change.

If there is content for a standard section, that section ought to
be present, even if it is very short.  The point is that a manual
page should not only contain the expected information, but all that
information should also be in the expected places, such that by merely
looking at the section headers, you can already tell whether a given
standard section has content, without reading any of the text.

For example, the RETURN VALUES section ought to be absent if and only
if all functions documented in the page are void.

For example, the ERRORS section ought to be absent if and only if
none of the functions ever touch the errno.  Admittedly, our pages
may not be perfect in that respect, some libc pages may be missing
an ERRORS section even though the functions in questions sometimes
set errno, but that's a (mild) defect.  We certainly should not
delete an existing ERRORS section for a function that definitely does
set the errno.

Also, in sections with a strongly conventional syntax (like ERRORS),
it is preferable to use the standard syntax even in corner cases.
In this case, yes, i do recommend a list with a single element.
Similarly, a utility program that can take exactly one option still
gets the standard DESCRIPTION sentence "The options are as follows:"
and then a one-element list.  The advantage for readers is having the
familiar layout, and for developers the update becomes easier when
the second option is added; not that anything will ever be added to
pause(3), but consistency in style is still a good thing.  Similarly,
in code, using getopt(3) is recommended even in a program that only
takes a single option.

>> SEE ALSO
>>
>> - Nix select(2) and setitimer(2), they aren't directly relevant.

> OK.  The reason select(2) is there is probably because you can
> emulate pause() using select().

I don't feel strongly about this.

Removing select(2) is probably OK because the reason explained
by Todd is mostly historical.  In Perl, you still use select()
to sleep for less than one second, but that's irrelevant to C
programming.

I'm less convinced about removing setitimer(2).  To me, setitimer(2)
feels just as relevant as kill(2).  Both cause signals to be delivered
(asap or later), which is quite relevant because that causes pause(3)
to return.

>> HISTORY
>>
>> - We still have sigpause(3) and sigblock(3) in userspace.  Should
>>   we .Xr them?  They aren't systems calls anymore, but they were
>>   at that time.  Unsure what to do here.

Polishing the HISTORY section should in no case hold up your other
work.

> I think it is better to use .Fn for sigpause(3) and sigblock(3)
> rather than .Xr here.

In this very special case, where sigpause(3) and sigblock(3) are
strongly deprecated, i agree.  Pointing out their continued existence -
even below HISTORY - risks accidentally encouraging their use.

> Note that in 4.3-Reno pause(3) was still implemented in terms of
> sigpause(3) and sigblock(3), it is just that those functions were
> themselves wrappers instead of system calls.  Personally, I would
> drop the bit about 4.3-Reno since it is not really correct in my
> opinion.

It could be argued either way: IMHO, a wrapper around a wrapper around
foo is still a wrapper around foo, so at least my version that's
currently in the tree can be argued to be correct.  Admittedly, Scott's
version less so; pause(3) certainly wasn't reimplemented for Reno.
Then again, replacing the last phase (starting at "and around")
with the following would avoid the semantic disagreement:

  It has been using sigsuspend(2) and sigprocmask(2) since 4.3-BSD Reno.

Yours,
  Ingo



Re: strtonum.3: Use the proper macro for "long long"

2022-09-10 Thread Ingo Schwarze
Hi,

yes, this is completely correct, with one tiny exception that should
be fixed while committing, see in-line below.

Jason, since you already started working on this, could you please
commit this patch with OK schwarze@?

I'm surprised there was still so much .Li in our tree where .Vt
should have been.  These are not even edge cases but completely
unambiguous .Vt.

Note that the mdoc(7) manual deprecates .Li (it is a presentational
macro with an invisible effect - we usually want semantic rather
than presentational markup).  Rare cases exist where it may not be
completely obvious what to use instead, but here it is.

Thanks,
  Ingo


Josiah Frentsos wrote on Sat, Sep 10, 2022 at 12:29:28PM -0400:

> Index: lib/libc/gen/frexp.3
> Index: lib/libc/gen/getgrent.3
> Index: lib/libc/gen/getpwent.3
> Index: lib/libc/gen/getpwnam.3
> Index: lib/libc/gen/glob.3
> Index: lib/libc/gen/isalnum.3
> Index: lib/libc/gen/isalpha.3
> Index: lib/libc/gen/isblank.3
> Index: lib/libc/gen/iscntrl.3
> Index: lib/libc/gen/isdigit.3
> Index: lib/libc/gen/isgraph.3
> Index: lib/libc/gen/islower.3
> Index: lib/libc/gen/isprint.3
> Index: lib/libc/gen/ispunct.3
> Index: lib/libc/gen/isspace.3
> Index: lib/libc/gen/isupper.3
> Index: lib/libc/gen/isxdigit.3
> Index: lib/libc/gen/lockf.3
> Index: lib/libc/gen/login_cap.3
> Index: lib/libc/gen/modf.3
> Index: lib/libc/gen/opendir.3
> Index: lib/libc/gen/setjmp.3
> Index: lib/libc/gen/times.3
> Index: lib/libc/gen/tolower.3
> Index: lib/libc/gen/toupper.3
> Index: lib/libc/gen/uname.3
> Index: lib/libc/gen/utime.3
> Index: lib/libc/locale/localeconv.3
> Index: lib/libc/net/ether_aton.3
> Index: lib/libc/net/getaddrinfo.3
> Index: lib/libc/net/getnameinfo.3
> Index: lib/libc/net/getpeereid.3
> Index: lib/libc/net/getrrsetbyname.3
> Index: lib/libc/net/htonl.3
> ===
> RCS file: /cvs/src/lib/libc/net/htonl.3,v
> retrieving revision 1.5
> diff -u -p -r1.5 htonl.3
> --- lib/libc/net/htonl.3  13 Feb 2019 07:02:09 -  1.5
> +++ lib/libc/net/htonl.3  10 Sep 2022 16:10:01 -
> @@ -66,14 +66,14 @@ or
>  .Sq l )
>  is a mnemonic
>  for the traditional names for such quantities,
> -.Li short
> +.Vt short
>  and
> -.Li long ,
> +.Vt long ,
>  respectively.

This is misleading, as explained in the very next sentence.
I suggest just dropping the .Li markup in these two instances
without any replacement, or .Dq if you insist on some markup.

>  Today, the C concept of
> -.Li short
> +.Vt short
>  and
> -.Li long
> +.Vt long
>  integers need not coincide with this traditional misunderstanding.
>  On machines which have a byte order which is the same as the network
>  order, routines are defined as null macros.

This part is correct.

> Index: lib/libc/net/inet_addr.3
> Index: lib/libc/net/inet_net_ntop.3
> Index: lib/libc/net/inet_ntop.3
> Index: lib/libc/regex/regex.3
> Index: lib/libc/rpc/xdr.3
> Index: lib/libc/stdio/fseek.3
> Index: lib/libc/stdio/getc.3
> Index: lib/libc/stdio/putc.3
> Index: lib/libc/stdio/ungetc.3
> Index: lib/libc/stdlib/atof.3
> Index: lib/libc/stdlib/atoi.3
> Index: lib/libc/stdlib/atol.3
> Index: lib/libc/stdlib/atoll.3
> Index: lib/libc/stdlib/div.3
> Index: lib/libc/stdlib/getopt_long.3
> Index: lib/libc/stdlib/imaxdiv.3
> Index: lib/libc/stdlib/ldiv.3
> Index: lib/libc/stdlib/lldiv.3
> Index: lib/libc/stdlib/strtod.3
> Index: lib/libc/stdlib/strtonum.3
> Index: lib/libc/string/memccpy.3
> Index: lib/libc/string/memchr.3
> Index: lib/libc/string/memcmp.3
> Index: lib/libc/string/memset.3
> Index: lib/libc/sys/accept.2
> Index: lib/libc/sys/fcntl.2
> Index: lib/libc/sys/getpeername.2
> Index: lib/libc/sys/getrlimit.2
> Index: lib/libc/sys/getsockname.2
> Index: lib/libc/sys/getsockopt.2
> Index: lib/libc/sys/ioctl.2
> Index: lib/libc/sys/ptrace.2
> Index: lib/libc/sys/quotactl.2
> Index: lib/libc/termios/tcsetattr.3
> Index: lib/libc/time/ctime.3
> Index: lib/libradius/radius_new_request_packet.3
> Index: share/man/man3/bit_alloc.3
> Index: share/man/man3/dl_iterate_phdr.3
> Index: share/man/man4/bpf.4
> Index: share/man/man4/ddb.4
> Index: share/man/man4/openprom.4
> Index: share/man/man4/options.4
> Index: share/man/man4/speaker.4
> Index: share/man/man5/ranlib.5
> Index: share/man/man8/crash.8
> Index: share/man/man9/printf.9
> Index: share/man/man9/socreate.9
> Index: share/man/man9/style.9
> Index: usr.bin/ssh/sshd.8
> Index: usr.sbin/zdump/zdump.8



Re: [patch] rename tbl man pages

2022-07-07 Thread Ingo Schwarze
Hi Daniel,

Daniel Dickman wrote on Thu, Jul 07, 2022 at 12:03:18AM -0400:

> Over a decade ago there was some work in bsd.man.mk to build tbl pages 
> with mandoc. For example this commit from 2010:
> 
> 
> revision 1.32
> date: 2010/10/17 22:47:08;  author: schwarze;  state: Exp;  lines: +8 -18;
> Build tbl(1) pages with mandoc(1), not groff.
> Xenocara build checked myself, base build also by jmc@, thanks!
> "don't wait for me" deraadt@
> 
> Pages in base using tbl mostly look good already except for the
> rare .T{ macros; there may still be a few formatting issues
> in xenocara, please speak up when you run into them.
> Eventually, mandoc will catch up.

Today, the tbl(7) formatting of all the pages you are touching
is byte-by-byte identical between groff(1) and mandoc(1) -
with one minor exception in infocmp(1) where groff(1) renders
a few horizontal lines incorrectly due to a bug in groff(1).

> 
> 
> Back then there was some special infrastructure to build these man pages 
> if they were named something like *.6tbl.
> 
> That convention is a local OpenBSD convention, it does not exist outside 
> of our tree.
> 
> At this point the special naming for these man pages is no longer needed:

That is completely correct.

> * games/phantasia/phantasia.6tbl
> * gnu/usr.sbin/mkhybrid/src/mkhybrid.8tbl
> * share/man/man4/man4.hppa/cpu.4tbl
> * share/man/man4/wi.4tbl
> * share/man/man4/man4.hppa/cpu.4tbl
> * usr.bin/infocmp/infocmp.1tbl
> * usr.bin/tic/captoinfo.1tbl
> 
> The benefits are:
> - The affected Makefiles are shortened by 3+ lines
> - diffing external software like mkhybrid against the original sources 
>   results in less noise
> - remove an unnecessary cp step doing the build

The price to pay is relatively minor: making it slightly harder to
inspect commit history of these manual pages.  But these are not
edited often anyway, so that's not a big deal.

> ok on the diff below? (I've left out the man page renames from the diff 
> for brevity)

I don't have a strong opinion either way.  If you think it's worth it,
i do not object.

>From visual inspection, the diff looks correct to me.  I also tested
building and installing in the affected directories and noticed no
issues.  I did not run a complete make build / make release cycle
though, but i trust you to not break that.

Yours,
  Ingo


> Index: games/phantasia/Makefile
> ===
> RCS file: /cvs/src/games/phantasia/Makefile,v
> retrieving revision 1.18
> diff -u -p -u -r1.18 Makefile
> --- games/phantasia/Makefile  24 Nov 2015 03:10:10 -  1.18
> +++ games/phantasia/Makefile  7 Jul 2022 01:24:22 -
> @@ -6,7 +6,7 @@ CFLAGS+=-DTERMIOS
>  DPADD=   ${LIBM} ${LIBCURSES}
>  LDADD=   -lm -lcurses
>  MAN= phantasia.6
> -CLEANFILES+=map setup setup.o phantglobs.o.bld phantasia.6
> +CLEANFILES+=map setup setup.o phantglobs.o.bld
>  
>  all: setup phantasia
>  
> @@ -19,9 +19,6 @@ phantglobs.o.bld: phantglobs.c
>  setup: phantglobs.o.bld setup.o monsters.asc ${DPADD} 
>   ${HOSTCC} ${CFLAGS} ${LDFLAGS} ${LDSTATIC} -o ${.TARGET} \
> phantglobs.o.bld setup.o ${LDADD}
> -
> -phantasia.6: phantasia.6tbl
> - cp ${.ALLSRC} ${.TARGET}
>  
>  beforeinstall: 
>   ./setup -m ${.CURDIR}/monsters.asc
> Index: gnu/usr.sbin/mkhybrid/mkhybrid/Makefile
> ===
> RCS file: /cvs/src/gnu/usr.sbin/mkhybrid/mkhybrid/Makefile,v
> retrieving revision 1.7
> diff -u -p -u -r1.7 Makefile
> --- gnu/usr.sbin/mkhybrid/mkhybrid/Makefile   9 Sep 2015 20:02:31 -   
> 1.7
> +++ gnu/usr.sbin/mkhybrid/mkhybrid/Makefile   7 Jul 2022 01:24:22 -
> @@ -8,7 +8,6 @@
>  PROG=mkhybrid
>  MAN= mkhybrid.8
>  BINDIR=  /usr/sbin
> -CLEANFILES+= mkhybrid.8
>  .PATH:   ${.CURDIR}/../src ${.CURDIR}/../src/libhfs_iso 
> ${.CURDIR}/../src/libfile
>  
>  SRCS=data.c block.c low.c lfile.c btree.c node.c record.c lvolume.c \
> @@ -20,8 +19,5 @@ CFLAGS+=-DSYSTEM_ID_DEFAULT=\"OpenBSD\" 
>   -I${.CURDIR}/../src -I${.CURDIR}/../src/include \
>   -I${.CURDIR}/../src/libhfs_iso \
>   -I${.CURDIR}/../src/libfile
> -
> -mkhybrid.8: mkhybrid.8tbl
> - cp ${.ALLSRC} ${.TARGET}
>  
>  .include 
> Index: share/man/man4/man4.hppa/Makefile
> ===
> RCS file: /cvs/src/share/man/man4/man4.hppa/Makefile,v
> retrieving revision 1.29
> diff -u -p -u -r1.29 Makefile
> --- share/man/man4/man4.hppa/Makefile 30 Mar 2016 06:38:44 -  1.29
> +++ share/man/man4/man4.hppa/Makefile 7 Jul 2022 01:24:22 -
> @@ -6,9 +6,5 @@ MAN+= harmony.4 ie.4 intro.4 io.4 lasi.4
>  MAN+=phantomas.4 power.4 runway.4 ssio.4 uturn.4 wax.4
>  # tir.4 xbar.4 mcx.4
>  MANSUBDIR=hppa
> -CLEANFILES+= cpu.4
> -
> -cpu.4: cpu.4tbl
> - cp ${.ALLSRC} ${.TARGET}
>  
>  .include 
> Index: share/man/man4/Makefile

Re: [PATCH] Fix typo in ed(1) manpage

2022-06-21 Thread Ingo Schwarze
Hi Jason,

Jason McIntyre wrote on Sun, Jun 19, 2022 at 06:51:26AM +0100:
> On Sat, Jun 18, 2022 at 11:17:43PM +, S M wrote:

>> Sending a separate patch as this is unrelated to my previous e-mail.

> fixed, thanks.

Thanks.

> i note that the same text is present in regress/usr.bin/diff/t11.1 and
> t11.2. i know that will not matter, but noting here in case someone
> thinks that should be updated too.

I think it's better to leave those two files as they are because
/usr/src/regress/usr.bin/diff/Makefile says:

  # t11: rev 1.3 and 1.36 of usr.bin/ed/ed.1.

So fixing typos in there is more likely to cause confusion than to help
anything.  These are just random files to test diff(1) and patch(1),
so their content hardly matters, but since the Makefile documents
where the content comes from, that comment should better remain
accurate.

Yours,
  Ingo

>> diff --git bin/ed/ed.1 bin/ed/ed.1
>> index f5fc5be2e76..3ed865e0a05 100644
>> --- bin/ed/ed.1
>> +++ bin/ed/ed.1
>> @@ -537,7 +537,7 @@ is a positive number, causes only the
>>  match to be replaced.
>>  It is an error if no substitutions are performed on any of the addressed
>>  lines.
>> -The current address is set the last line affected.
>> +The current address is set to the last line affected.
>>  .Pp
>>  .Ar re
>>  and



Re: [PATCH] man - apmd(8) reference

2022-06-19 Thread Ingo Schwarze
Hi,

wdaver wrote on Sun, Jun 19, 2022 at 02:09:06PM -0700:

> Ah - mangl returns

Whoa, i wasn't even aware that "mangl" exists, but it appears it
does exist and we even have a port for it:  sysutils/mangl.

It isn't very happy on my machine, though:

  schwarze@isnote $ mangl mangl
  Segmentation fault (core dumped) 
  schwarze@isnote $ mangl man
  Segmentation fault (core dumped) 
  schwarze@isnote $ mangl apm 
  Segmentation fault (core dumped) 
  schwarze@isnote $ mangl 4 apm
  Segmentation fault (core dumped) 
  schwarze@isnote $ mangl
  schwarze@isnote $ 

It appears mangl is using an embedded copy of mandoc code that
is between two and three years old.  Consequently, using it is
probably a bad idea.

That said, adding to what Theo explained,

  man 4 apm

and

  man -s 4 apm

also return the "Alliance ProMotion video driver" manual
on all architectures except amd64, arm64, i386, loongson, and macppc,
which you can verify with these commands even on amd64:

  schwarze@isnote $ man -k ^apm\$ 
  apm(4/amd64) - power management interface
  apm(4/arm64) - power management interface
  apm(4/i386) - advanced power management device interface
  apm(4/loongson) - advanced power management device interface
  apm(4/macppc) - advanced power management device interface
  apm(4) - Alliance ProMotion video driver
  apm, zzz, ZZZ(8) - Advanced Power Management control program

  schwarze@isnote $ man -S foo -s 4 apm | head -n 4
  APM(4) Device Drivers ManualAPM(4)
  NAME
   apm - Alliance ProMotion video driver

Admittedly, it is unfortunate that the Xenocara apm(4) page clashes
with the base system apm(4) pages.  But i don't think we can do
anything about it.  The system works as designed: the man(1) command
prefers base system pages over Xenocara pages by default, so on the
platforms having "advanced power management", you get that, and on the
others, you get "Alliance ProMotion".  It's not a bug, it's a
feature; even if an unfortunate feature in this particular case.

> Should I send this as a bug to ports or bugs mailing list?

No, i think this thread can be closed as "invalid report".

By the way, next time you report a bug, please state, up front,

 1. the architecture you are running on
 2. the version of OpenBSD you are using
 3. and the exact, complete command you are typing.

Yours,
  Ingo



Re: apmd(8): reconnecting AC, not battery

2022-05-28 Thread Ingo Schwarze
Hi Jason,

Jason McIntyre wrote on Sat, May 28, 2022 at 05:10:08PM +0100:
> On Sat, May 28, 2022 at 10:34:35AM +0100, Stuart Henderson wrote:
>> On 2022/05/28 06:52, Jason McIntyre wrote:
>>> On Fri, May 27, 2022 at 07:19:37PM +0200, Jan Stary wrote:

 apmd says:
 
   When the power status changes (battery is connected or disconnected),
   apmd fetches the current status and reports it via syslog(3)
   with logging facility LOG_DAEMON.
 
 Is "battery" really meant here?  Should that be the AC power?
 Batteries are typically not being reconnected while running ...

>> They certainly can be, either while on external power, or in the case of
>> machines with multiple batteries (several generations of X series ThinkPads
>> have both internal and external batteries, so you can swap without powering
>> down even while on battery power). But yes the common case would be
>> external power.

 (The manpage calls it "external power", "line current"
 and "AC" interchangably.)

>>> i think you're right that the text is off. the author possibly meant to
>>> say battery was charging or discharging.
>>> 
>>> anyway, if i don;t hear any reason not to soon, i'll commit your
>>> suggested diff (which i think is simpler and makes sense).

>> Why be inaccurate when we don't need to be - it's very uncommon to have
>> a machine with a battery with an AC input. "External power" would make
>> more sense.

> i have to say, i don;t understand the distinction. i thought a
> laptop was "a machine with a battery with an AC input" (or, at
> least, *and* an AC input). however i committed a diff with your
> text ("external power") knowing that you are undoubtedly correct ;)

I guess an example for what sthen@ wanted to say is that most laptops
have an external power adapter - for example, the one of the machine
i'm currently typing on has

  INPUT:  100-240V~ 1.3A  50-60Hz
  OUTPUT: 20V=  2.25A

printed on it (transforming AC to DC), so the machine itself
does *not* have an AC input.  Instead, the "external power" the
machine receives is already DC.

Yours,
  Ingo



Re: bwfm(4): show modulation type for the various chipsets

2022-04-23 Thread Ingo Schwarze
Hi Stuart,

Stuart Henderson wrote on Sat, Apr 23, 2022 at 06:52:46PM +0100:

> saves time if you want to ignore 11n-only devices. ok?

Adding useful information is good in general, but i can't really
comment on the content.

If you add a column to a -column list, usually you also want to update
the .Bl -column line.  Unless there are specific reasons to do it
otherwise, the .Bl -column line should usually repeat the longest
string in each column, for example:

  -.Bl -column "Chipset" "Spectrum" "MIMO" "Bus" -offset 6n
  +.Bl -column BCM43236 2GHz/5GHz Type MIMO SDMMC/USB -offset 6n

Both groff and mandoc use the number of columns specified in
the .Bl -column line to pick a visually pleasing spacing between
columns (in this case, 4n without updating the .Bl -column line
or 3n after updating it; in the case at hand, the effect is partially
compensated by the .Bl -column containing shorter strings rather
than the longest, which seems a bit confusing :).

While there, i suggest dropping the useless quoting.

Yours,
  Ingo


> Index: share/man/man4/bwfm.4
> ===
> RCS file: /cvs/src/share/man/man4/bwfm.4,v
> retrieving revision 1.16
> diff -u -p -r1.16 bwfm.4
> --- share/man/man4/bwfm.4 5 Jan 2022 17:39:24 -   1.16
> +++ share/man/man4/bwfm.4 23 Apr 2022 17:51:24 -
> @@ -35,29 +35,29 @@ as well as the bus attachments recognize
>  .Nm
>  driver:
>  .Bl -column "Chipset" "Spectrum" "MIMO" "Bus" -offset 6n
> -.It Em Chipset Ta Em Spectrum Ta Em MIMO Ta Em Bus
> -.It BCM43143 Ta 2GHz Ta 1x1 Ta SDMMC/USB
> -.It BCM43236 Ta 2GHz/5GHz Ta 2x2 Ta USB
> -.It BCM4324 Ta  2GHz/5GHz Ta 2x2 Ta SDMMC
> -.It BCM43242 Ta 2GHz/5GHz Ta 2x2 Ta USB
> -.It BCM4329 Ta  2GHz/5GHz Ta 2x2 Ta SDMMC
> -.It BCM4330 Ta  2GHz/5GHz Ta 2x2 Ta SDMMC
> -.It BCM4334 Ta  2GHz/5GHz Ta 2x2 Ta SDMMC
> -.It BCM43340 Ta 2GHz/5GHz Ta 1x1 Ta SDMMC
> -.It BCM43341 Ta 2GHz/5GHz Ta 1x1 Ta SDMMC
> -.It BCM4335 Ta  2GHz/5GHz Ta 1x1 Ta SDMMC
> -.It BCM43362 Ta 2GHz Ta 1x1 Ta SDMMC
> -.It BCM43364 Ta 2GHz Ta 1x1 Ta SDMMC
> -.It BCM4339 Ta  2GHz/5GHz Ta 1x1 Ta SDMMC
> -.It BCM43430 Ta 2GHz Ta 1x1 Ta SDMMC
> -.It BCM43455 Ta  2GHz/5GHz Ta 1x1 Ta SDMMC
> -.It BCM43456 Ta  2GHz/5GHz Ta 2x2 Ta SDMMC
> -.It BCM4350 Ta 2GHz/5GHz Ta 2x2 Ta PCI
> -.It BCM4354 Ta  2GHz/5GHz Ta 2x2 Ta SDMMC
> -.It BCM4356 Ta 2GHz/5GHz Ta 2x2 Ta PCI/SDMMC
> -.It BCM43569 Ta 2GHz/5GHz Ta 2x2 Ta USB
> -.It BCM43602 Ta 2GHz/5GHz Ta 3x3 Ta PCI
> -.It BCM4371 Ta 2GHz/5GHz Ta 2x2 Ta PCI
> +.It Em Chipset Ta Em Spectrum Ta Em Type Ta Em MIMO Ta Em Bus
> +.It BCM43143 Ta 2GHz Ta 11n Ta 1x1 Ta SDMMC/USB
> +.It BCM43236 Ta 2GHz/5GHz Ta 11n Ta 2x2 Ta USB
> +.It BCM4324 Ta  2GHz/5GHz Ta 11n Ta 2x2 Ta SDMMC
> +.It BCM43242 Ta 2GHz/5GHz Ta 11n Ta 2x2 Ta USB
> +.It BCM4329 Ta  2GHz/5GHz Ta 11n Ta 2x2 Ta SDMMC
> +.It BCM4330 Ta  2GHz/5GHz Ta 11n Ta 2x2 Ta SDMMC
> +.It BCM4334 Ta  2GHz/5GHz Ta 11n Ta 2x2 Ta SDMMC
> +.It BCM43340 Ta 2GHz/5GHz Ta 11n Ta 1x1 Ta SDMMC
> +.It BCM43341 Ta 2GHz/5GHz Ta 11n Ta 1x1 Ta SDMMC
> +.It BCM4335 Ta  2GHz/5GHz Ta 11ac Ta 1x1 Ta SDMMC
> +.It BCM43362 Ta 2GHz Ta 11n Ta 1x1 Ta SDMMC
> +.It BCM43364 Ta 2GHz Ta 11n Ta 1x1 Ta SDMMC
> +.It BCM4339 Ta  2GHz/5GHz Ta 11ac Ta 1x1 Ta SDMMC
> +.It BCM43430 Ta 2GHz Ta 11n Ta 1x1 Ta SDMMC
> +.It BCM43455 Ta  2GHz/5GHz Ta 11ac Ta 1x1 Ta SDMMC
> +.It BCM43456 Ta  2GHz/5GHz Ta 11ac Ta 2x2 Ta SDMMC
> +.It BCM4350 Ta 2GHz/5GHz Ta 11ac Ta 2x2 Ta PCI
> +.It BCM4354 Ta  2GHz/5GHz Ta 11ac Ta 2x2 Ta SDMMC
> +.It BCM4356 Ta 2GHz/5GHz Ta 11ac Ta 2x2 Ta PCI/SDMMC
> +.It BCM43569 Ta 2GHz/5GHz Ta 11ac Ta 2x2 Ta USB
> +.It BCM43602 Ta 2GHz/5GHz Ta 11ac Ta 3x3 Ta PCI
> +.It BCM4371 Ta 2GHz/5GHz Ta 11ac Ta 2x2 Ta PCI
>  .El
>  .Pp
>  These are the modes the



Re: Security support status of xnf(4) and xbf(4)

2022-03-26 Thread Ingo Schwarze
Hi Demi Marie,

Demi Marie Obenour wrote on Fri, Mar 25, 2022 at 12:13:59PM -0400:

> Linux’s netfront and blkfront drivers recently had a security
> vulnerability (XSA-396) that allowed a malicious backend to potentially
> compromise them.  In follow-up audits, I found that OpenBSD’s xnf(4)
> currently trusts the backend domain.  I reported this privately to Theo
> de Raadt, who indicated that OpenBSD does not consider this to be a
> security concern.
> 
> This is obviously a valid position for the OpenBSD project to take, but
> it is surprising to some (such as myself) from the broader Xen
> ecosystem.  Standard practice in the Xen world is that bugs in frontends
> that allow a malicious backend to cause mischief *are* considered
> security bugs unless there is explicit documentation to the contrary.

I do not have the slightest idea what you are talking about here,
so i am not commenting on any of the above.

> As such, I believe this deserves to be noted in xnf(4) and xbf(4)’s man
> pages.

I can't comment whether this needs mentioning in the manual either,
but others will likely answer that question when you send a patch.

> If the OpenBSD project agrees, I am willing to write a patch,
> but I have no experience with mandoc

You do not need experience with mandoc(1) to write a manual page patch.
To get the formatting right, looking at the mdoc(7) manual page ought
to be sufficient.

But don't worry too much if you have no experience with mdoc(7).
The main job of people submitting manual page patches is to get
the content, wording, and placement of the text right.  The
formatting can easily be tweaked if needed.

> so it might take a few tries.

That's not a problem; the saying here goes "bad patches trigger good
ones".

Yours,
  Ingo



Re: __read_mostly

2022-03-20 Thread Ingo Schwarze
Hi,

>> A downside of this is that it becomes easier to guess the addresses
>> of the tagged variables.

> No kidding.  It partly undoes the effort of KARL.  

I don't feel qualified to comment on the patch,
but i can't resist mentioning that i still love
tedu@'s classical dictum "attack mitigation countermeasures"
which he coined during the aftermath of the heartbleed debacle.

Yours,
  Ingo



Re: mandoc.1: update example to reflect current options

2022-02-08 Thread Ingo Schwarze
Hello Anders,

Anders Damsgaard wrote on Tue, Feb 08, 2022 at 08:36:11AM +0100:

> The mandoc(1) option alias -l for -a was removed from the documentation
> in revision 1.107:
> 
> https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/mandoc/mandoc.1.diff?r1=1.106=1.107
> 
> The -l option still works, but this diff makes the example consistent
> with the current option description.

Thanks, committed.

Both are correct and reliable, but i agree -a is easier to understand
whereas understanding why -l works is a bit convoluted.

Yours,
  Ingo



Log message:
In the first example, use "mandoc -a" directly rather "mandoc -l".

It feels more natural to me to use -a directly when asking mandoc(1)
to use a pager.  The reason that "mandoc -l" does exactly the same
as "mandoc -a" is that "mandoc" is essentially "man -lc", so the -a
implied by -l negates the -c and the -l has no effect because it is
already the default for mandoc(1).

The more usual command for doing the same is "man -l foo.1 bar.1 ..."
but that's off-topic for the mandoc(1) manual page.

Patch on tech@ from Anders Damsgaard .


> Index: usr.bin/mandoc/mandoc.1
> ===
> RCS file: /cvs/src/usr.bin/mandoc/mandoc.1,v
> retrieving revision 1.180
> diff -u -p -u -p -r1.180 mandoc.1
> --- usr.bin/mandoc/mandoc.1   6 Feb 2022 00:29:03 -   1.180
> +++ usr.bin/mandoc/mandoc.1   8 Feb 2022 07:33:42 -
> @@ -735,7 +735,7 @@ output mode implies
>   .Sh EXAMPLES
>   To page manuals to the terminal:
>   .Pp
> -.Dl $ mandoc -l mandoc.1 man.1 apropos.1 makewhatis.8
> +.Dl $ mandoc -a mandoc.1 man.1 apropos.1 makewhatis.8
>   .Pp
>   To produce HTML manuals with
>   .Pa /usr/share/misc/mandoc.css



Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-12 Thread Ingo Schwarze
Hi Martijn,

Martijn van Duren wrote on Sun, Jan 09, 2022 at 08:09:20AM +0100:

> I fully agree with your reasoning and also prefer this one over the
> previous two diff.
> 
> OK martijn@

Thanks for checking; committed.
Also thanks to cheloha@ for his work on this issue
and to milllert@ for his feedback.
  Ingo


> On Sun, 2022-01-09 at 00:45 +0100, Ingo Schwarze wrote:

>> Index: rev.c
>> ===
>> RCS file: /cvs/src/usr.bin/rev/rev.c,v
>> retrieving revision 1.13
>> diff -u -p -r1.13 rev.c
>> --- rev.c10 Apr 2016 17:06:52 -  1.13
>> +++ rev.c8 Jan 2022 23:19:46 -
>> @@ -46,13 +46,14 @@ void usage(void);
>>  int
>>  main(int argc, char *argv[])
>>  {
>> -char *filename, *p = NULL, *t, *u;
>> +char *filename, *p = NULL, *t, *te, *u;
>>  FILE *fp;
>>  ssize_t len;
>>  size_t ps = 0;
>> -int ch, rval;
>> +int ch, multibyte, rval;
>>  
>>  setlocale(LC_CTYPE, "");
>> +multibyte = MB_CUR_MAX > 1;
>>  
>>  if (pledge("stdio rpath", NULL) == -1)
>>  err(1, "pledge");
>> @@ -83,14 +84,16 @@ main(int argc, char *argv[])
>>  if (p[len - 1] == '\n')
>>  --len;
>>  for (t = p + len - 1; t >= p; --t) {
>> -if (isu8cont(*t))
>> -continue;
>> -u = t;
>> -do {
>> -putchar(*u);
>> -} while (isu8cont(*(++u)));
>> +te = t;
>> +if (multibyte)
>> +while (t > p && isu8cont(*t))
>> +--t;
>> +for (u = t; u <= te; ++u)
>> +if (putchar(*u) == EOF)
>> +err(1, "stdout");
>>  }
>> -putchar('\n');
>> +if (putchar('\n') == EOF)
>> +err(1, "stdout");
>>  }
>>  if (ferror(fp)) {
>>  warn("%s", filename);
>> @@ -104,7 +107,7 @@ main(int argc, char *argv[])
>>  int
>>  isu8cont(unsigned char c)
>>  {
>> -return MB_CUR_MAX > 1 && (c & (0x80 | 0x40)) == 0x80;
>> +return (c & (0x80 | 0x40)) == 0x80;
>>  }
>>  
>>  void



Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-08 Thread Ingo Schwarze
Oopsie...

Ingo Schwarze wrote on Sun, Jan 09, 2022 at 12:45:27AM +0100:

> # Demonstate broken error handling in the old code.
> 
>$ ./obj/wrap /obin/rev   
>$ echo $?  
>   42

Stupid me, this is what i meant, of course:

   $ ./obj/wrap /oldbin/rev
   $ echo $?
  0

The point is that the -current rev(1) silently ignores write errors
*and returns indicating success* even though it produced no output or
incomplete output.  But i can't show that if i mistype the path to
the copy of the old binary that i saved earlier...

Yours,
  Ingo



Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-08 Thread Ingo Schwarze
Hi,

Martijn van Duren wrote on Sat, Jan 08, 2022 at 08:30:20AM +0100:

> Why not go for the following diff?
> It has a comparable speed increase, but without the added complexity
> of a second inner loop.

Actually, i like the idea of not duplicating the loop, in cases where
that is possible without a noticeable performance cost.

I admit i OK'ed the original UTF-8 diff almost six years ago,
but looking at the code again, i now dislike how it is testing
every byte twice for no apparent reason.  Even worse, i regard
the lack of I/O error checking on write operations as a bug.
Note that it does make sense to proceed to the next file on *read*
errors, whereas a *write* error should better be fatal.

For those reason, and because i prefer versions of isu8cont()
that do not inspect the locale, i propose the following patch
instead.

As an aside, i tried using fwrite(3) instead of the manual
putchar(*u) loop, but that is almost exactly as slow as repeatedly
calling MB_CUR_MAX, probably due to either locking or orientation
setting or some aspect of iov handling or buffering or more than
one of those; i refrained from profiling it.

 - 8< - schnipp - >8 - 8< - schnapp - >8 -

Index: rev.c
===
RCS file: /cvs/src/usr.bin/rev/rev.c,v
retrieving revision 1.13
diff -u -p -r1.13 rev.c
--- rev.c   10 Apr 2016 17:06:52 -  1.13
+++ rev.c   8 Jan 2022 23:19:46 -
@@ -46,13 +46,14 @@ void usage(void);
 int
 main(int argc, char *argv[])
 {
-   char *filename, *p = NULL, *t, *u;
+   char *filename, *p = NULL, *t, *te, *u;
FILE *fp;
ssize_t len;
size_t ps = 0;
-   int ch, rval;
+   int ch, multibyte, rval;
 
setlocale(LC_CTYPE, "");
+   multibyte = MB_CUR_MAX > 1;
 
if (pledge("stdio rpath", NULL) == -1)
err(1, "pledge");
@@ -83,14 +84,16 @@ main(int argc, char *argv[])
if (p[len - 1] == '\n')
--len;
for (t = p + len - 1; t >= p; --t) {
-   if (isu8cont(*t))
-   continue;
-   u = t;
-   do {
-   putchar(*u);
-   } while (isu8cont(*(++u)));
+   te = t;
+   if (multibyte)
+   while (t > p && isu8cont(*t))
+   --t;
+   for (u = t; u <= te; ++u)
+   if (putchar(*u) == EOF)
+   err(1, "stdout");
}
-   putchar('\n');
+   if (putchar('\n') == EOF)
+   err(1, "stdout");
}
if (ferror(fp)) {
warn("%s", filename);
@@ -104,7 +107,7 @@ main(int argc, char *argv[])
 int
 isu8cont(unsigned char c)
 {
-   return MB_CUR_MAX > 1 && (c & (0x80 | 0x40)) == 0x80;
+   return (c & (0x80 | 0x40)) == 0x80;
 }
 
 void

 - 8< - schnipp - >8 - 8< - schnapp - >8 -

The rest of this mail contains test reports.
These test reports use two test programs:

 - 8< - schnipp - >8 - 8< - schnapp - >8 -

/* makeutf8.c */

#include 
#include 
#include 
#include 
#include 

int
main(void)
{
FILE*in, *out;
char*line = NULL;
size_t   linesize = 0;
unsigned int wc, wcl = 0;

if (setlocale(LC_CTYPE, "en_US.UTF-8") == NULL)
err(1, "setlocale");

if ((in = fopen("/usr/src/gnu/usr.bin/perl/lib/unicore/"
"UnicodeData.txt", "r")) == NULL)
err(1, "in");
if ((out = fopen("utf8.txt", "w")) == NULL)
err(1, "out");

while (getline(, , in) != -1) {
if (sscanf(line, "%x;", ) < 1)
err(1, "scanf: %s", line);
if (wc > wcl + 1)
if (fputwc(L'\n', out) == WEOF)
err(1, "write EOL");
if (fputwc(wc, out) == WEOF) {
if (errno == EILSEQ)
warn("write U+%04x", wc);
else
err(1, "write U+%04x", wc);
}
wcl = wc;
}
if (ferror(in))
errx(1, "read error");
if (fputwc(L'\n', out) == WEOF)
err(1, "write final EOL");

fclose(out);
fclose(in);
return 0;
}

 - 8< - schnipp - >8 - 8< - schnapp - >8 -

/* wrap.c */

#include 

int
main(int argc, char *argv[])
{

Re: rev(1): pull MB_CUR_MAX out of the hot loop

2022-01-07 Thread Ingo Schwarze
Hi Scott,

thank you for spending quite some work on our small utility programs,
and sorry for failing to follow up as often as i should.

Scott Cheloha wrote on Fri, Jan 07, 2022 at 09:45:41AM -0600:

> In rev(1), we call MB_CUR_MAX for every byte in the input stream.
> This is extremely expensive.

That's actually quite ridiculous.

The reason why MB_CUR_MAX is so expensive is -- multi-thread support.
That is of course completely pointless for small utility programs,
on any operating system.  And on OpenBSD in particular, it is
mostly useless even for large application programs that actually
use multiple threads because we basically only support one single
locale in the first place.

So i had a brief look at the implementation of MB_CUR_MAX in our libc
to see whether it could be made less expensive, but i don't see a
simple way without potentially breaking some application programs
(for example, consider a third-party webserver that wants to run
with a POSIX locale in one thread but with a UTF-8 locale in
another thread).

Apart from that, even if it could be made less expensive, avoiding
to call it in tight loops might still be valuable in the interest
of portability.  On other systems, it must necessarily be expensive:
that web server might want to run one thread with UTF-8, one with
UTF-16, one with UTF-32, and one with Shift JIS...

So i agree with jour general idea to improve rev(1) and other
small utility programs that might have similar issues.

> It is much cheaper to call it once per line and use a simpler loop
> (see the inlined patch below) if the current locale doesn't handle
> multibyte characters:

[useful measurements snipped]

> CC schwarze@ to double-check I'm not misunderstanding MB_CUR_MAX.
> I'm under the impression the return value cannot change unless
> we call setlocale(3).

Mostly correct, except that uselocale(3) may also change it.
But of course, we don't call uselocale(3) in small utility programs.

> ok?

I think your patch can still be improved a bit before committing.


millert@ wrote:

> Why not just store the result of MB_CUR_MAX in a variable and use
> that?  It's value is not going to change during execution of the
> program.  This made a big difference in sort(1) once upon a time.

I agree with that.

Even with your patch, rev(1) would still be slow for large files
consisting of very short lines.


> Index: rev.c
> ===
> RCS file: /cvs/src/usr.bin/rev/rev.c,v
> retrieving revision 1.13
> diff -u -p -r1.13 rev.c
> --- rev.c 10 Apr 2016 17:06:52 -  1.13
> +++ rev.c 7 Jan 2022 15:38:39 -
> @@ -82,13 +82,18 @@ main(int argc, char *argv[])
>   while ((len = getline(, , fp)) != -1) {
>   if (p[len - 1] == '\n')
>   --len;
> - for (t = p + len - 1; t >= p; --t) {
> - if (isu8cont(*t))
> - continue;
> - u = t;
> - do {
> - putchar(*u);
> - } while (isu8cont(*(++u)));
> + if (MB_CUR_MAX == 1) {
> + for (t = p + len - 1; t >= p; t--)
> + putchar(*t);
> + } else {
> + for (t = p + len - 1; t >= p; --t) {

I dislike how you use t-- in one loop but --t in the other.
It made me wonder for a moment whether that needs to be different
depending on the character set.  Wouldn't it be less confusing
to pick one of the forms and stick to it?

> + if (isu8cont(*t))
> + continue;
> + u = t;
> + do {
> + putchar(*u);
> + } while (isu8cont(*(++u)));
> + }
>   }
>   putchar('\n');
>   }
> @@ -104,7 +109,7 @@ main(int argc, char *argv[])
>  int
>  isu8cont(unsigned char c)
>  {
> - return MB_CUR_MAX > 1 && (c & (0x80 | 0x40)) == 0x80;
> + return (c & (0x80 | 0x40)) == 0x80;
>  }

The isu8cont() function is a semi-standard function used in many
OpenBSD utility programs, originally designed by tedu@, i think,
and it has proven highly useful to get UTF-8 support going.

But back in the day, i think we failed to realize that MB_CUR_MAX
is expensive.  In that sense, it is probably somewhat ill-designed:
it is typically called in tight loops, so i guess there aren't
many use cases where the original version is really that good.

Given that ksh(1) and ypldap(8) already use versions of this
function identical to your version, i'm OK with your suggestion
of keeping the name while changing the semantics of the function,

Re: rpki-client: check ipAddrBlock and autonomousSysNum for criticality

2021-12-25 Thread Ingo Schwarze
Hi Claudio,

Claudio Jeker wrote on Sat, Dec 25, 2021 at 05:48:53PM +0100:
> On Sat, Dec 25, 2021 at 11:36:50AM +0100, Theo Buehler wrote:

>> These extensions MUST be marked critical by the sections of the spec
>> mentioned in the cryptowarnx(). That's determined by the ASN1_BOOLEAN
>> that is extracted and ignored after the FIXME a few lines below each of
>> the two hunks. Rather than getting the info from there, it's easier to
>> use an API call that checks what was already parsed by d2i_X509().

> I like this a lot. OK claudio@

Merely to avoid a misunderstanding:
I have no oponion whatsoever on this rpki-client patch.

> I would love to get rid of X509_V_FLAG_IGNORE_CRITICAL and use a callback
> to ensure the right extensions are critical but I never managed to
> understand how the X509_verify_cert() callback actually works.
> Documentation seems to be non-existent.

Not exactly non-existent, but admittedly very poor indeed.

There is X509_STORE_CTX_set_verify_cb(3), mostly written by
Dr. Stephen Henson in 2009.  But it does not really explain how the
callback is used and instead only provides four examples.

The reasons for this being so badly documented are as follows:

 1. X.509 PKI Path Validation as specified in RFC 5280 is ridiculously
complicated no matter the implementation.

 2. Our API design and implementation was inherited from OpenSSL,
and everybody knows what that means.

 3. Our documentation was inherited from OpenSSL, and everybody
knows what that means.

 4. Beck@ has embarked on an epic quest to mitigate parts of the
consequences of item 2.  Consequently, the surrounding airs
have been saturated with large amounts of dust for quite
some time now.

 5. While parts of that dust have arguably started to settle,
the air still feels somewhat dusty to me, and i have been
fearing that spending the several days of work required
to throughly improve this particular part of the documentation
might end up as working straight for the waste paper basket
once beck@ achieves his next major steps forward.

So given that there are literally hundreds of other public API functions
in libcrypto that are still completely undocumented (as opposed to the
verification callback being documented very badly), i so far refrained
from even trying to attack that particular problem...

Yours,
  Ingo



Re: document patch(1) not supporting binary diffs

2021-12-21 Thread Ingo Schwarze
Hi Theo,

Theo de Raadt wrote on Mon, Dec 20, 2021 at 10:37:00AM -0700:
> Ingo Schwarze  wrote:

>> The patch(1) manual talks about "lines" throughout,
>> and for binary files, a concept of "lines" does not even exist.

> That is a bit strong.  Some utilities designed for "text files" have no
> problem being both 8-bit clean and very-long-line capable.  For example,
> you can use emacs to edit a /bsd.  Add a space at the start of the file,
> save it.  re-edit the file, delete the space, and you get the same
> result.  I changed the internals of mg decades ago to also be 8-bit
> clean + very-long-line capable.

Yes, i agree such extensions are often useful.

> I think in POSIX this concept of "text
> file" is very weakly defined -- there are a variety of line-length
> issues, weird issues relating to \n and \r translation, obvious
> inability to handle embeded NUL etc, etc etc, and even end-of-file
> newline termination.
> 
> It is a bit of a copout to say "text file", when the truth really is is
> "imperfect content handling".  OTOH, adding perfect handling in many unix
> programs would be basically impossible, the complexity would be pretty high.
> 
> It may even be that the patch inputfile format cannot handle arbitrary data.
> It certainly has newline followed by '+', '-', or ' ' and ending with '\n'
> as context tracking, so it might be impossible.
 
>> The problem arises from the manual failing to mention that patch(1)
>> operates on text files.

> Yes, I am OK with it saying so.

>> Many standard utilities operate on text files
>> only, the concept of a text-file is both well-known and defined by
>> POSIX, so there is no need to re-explain what that means in individual
>> pages.

> Nope, I think POSIX fails to create a good definition.

Admittedly, the POSIX definition is a bit weak in some respects,
but all the same, it is usable as a minmium requirement that many
standard utilities have in common.

> Look at the
> recent commit to uniq.c, you even commented on it.  patch can handle files
> without trailing \n, right?   If the POSIX definition was real, patch would
> probably need to fail upon those files, creating other problems.

If POSIX says "The foobar utility operates on text files", that does not
mean that it requires foobar to fail if the input is not a text file.
It merely means that the behaviour is unspecified in that case.

It may occasionally be useful to add statements similar to the following
to the STANDARDS section:

  As an extension to that specification,

  foobar can handle files containing NUL bytes
or
  foobar can handle files containing bytes that do not form characters
or
  foobar can handle lines longer than LINE_MAX bytes
or
  foobar can handle files lacking the terminating newline character
  on the last line

Only in cases where such an extension is useful for some practical
purpose and we are willing to maintain it, of course.

Yours,
  Ingo



Re: document patch(1) not supporting binary diffs

2021-12-21 Thread Ingo Schwarze
Jason McIntyre wrote on Mon, Dec 20, 2021 at 04:13:19PM +:

> i'm ok with your diff

Thanks for checking.

> but it is slightly misleading in context of
> reading from stdin. i suppose that is no biggie.

I think i see why you say so.

Speaking *formally*, what we usually do is describing what the utility
does by default in the first sentence, then explain how command line
options modify that behaviour below "The options are as follows".
We could do that here, somewhat like this:

  patch reads a text file called the "patch file" from standard input,
  containing any of the four forms ... yadayada ...
  and applies those differences to an original text file ...

Besides, usually, we would *not* show a SYNOPSIS line

  patch  it would be simpler to just s/patch file/text file/ maybe?

True, that would be simpler and not outright wrong.  But in normal
operation, patch(1) deals with many text files, so i worry that
people might get confused as to which text file we are talking about.
It seems valuable for the first sentence to make it clear that one
among all these files is special and to introduce the term "patch file"
or ".Ar patchfile".

So to first get the issue reported by chrisz@ out of the way, i
committed the patch as-is.

We could say

  .Nm
  takes a text file called the
  .Sq patch file
  containing any of the four forms ...

Do you think that is better, like in the patch below?

Yours,
  Ingo


Index: patch.1
===
RCS file: /cvs/src/usr.bin/patch/patch.1,v
retrieving revision 1.34
diff -u -r1.34 patch.1
--- patch.1 21 Dec 2021 08:07:20 -  1.34
+++ patch.1 21 Dec 2021 08:37:54 -
@@ -47,8 +47,8 @@
 .Pf \*(Lt Ar patchfile
 .Sh DESCRIPTION
 .Nm
-takes the text file
-.Ar patchfile
+takes a text file called the
+.Dq patch file
 containing any of the four forms of difference
 listing produced by the
 .Xr diff 1
@@ -56,7 +56,7 @@
 producing a patched version.
 If
 .Ar patchfile
-is omitted, or is a hyphen, the patch will be read from the standard input.
+is omitted or is a hyphen, the patch file is read from the standard input.
 .Pp
 .Nm
 will attempt to determine the type of the diff listing, unless overruled by a



Re: document patch(1) not supporting binary diffs

2021-12-20 Thread Ingo Schwarze
Hi Christopher,

Christopher Zimmermann wrote on Mon, Dec 20, 2021 at 04:01:49PM +0100:

> base patch cannot work with diffs of binary files. It might help to say 
> so in the manpage since other implementations do support this (ab)use of 
> patch. OK?

I agree you are pointing out a slight problem with the manual page,
but i dislike your patch for two reasons:

This is not a bug, but a fundamental design decision.
The patch(1) manual talks about "lines" throughout,
and for binary files, a concept of "lines" does not even exist.

And the basic idea of a tool ought to be described in the first
sentence of the description, not as an afterthought at the very end.

The problem arises from the manual failing to mention that patch(1)
operates on text files.  Many standard utilities operate on text files
only, the concept of a text-file is both well-known and defined by
POSIX, so there is no need to re-explain what that means in individual
pages.

Consequently, i think something like the following would be better.

Yours,
  Ingo


Index: patch.1
===
RCS file: /cvs/src/usr.bin/patch/patch.1,v
retrieving revision 1.33
diff -u -r1.33 patch.1
--- patch.1 9 Nov 2021 16:13:40 -   1.33
+++ patch.1 20 Dec 2021 15:42:10 -
@@ -47,10 +47,12 @@
 .Pf \*(Lt Ar patchfile
 .Sh DESCRIPTION
 .Nm
-will take a patch file containing any of the four forms of difference
+takes the text file
+.Ar patchfile
+containing any of the four forms of difference
 listing produced by the
 .Xr diff 1
-program and apply those differences to an original file,
+program and applies those differences to an original text file,
 producing a patched version.
 If
 .Ar patchfile



Re: Fix typo in '}' command in less.1

2021-12-10 Thread Ingo Schwarze
Hi Richard,

Richard Ulmer wrote on Fri, Dec 10, 2021 at 04:04:11PM +0100:

> this is just a minor copy-and-paste error fix for the less(1) man page.

Committed, thank you.

> I also contributed this upstream: https://github.com/gwsw/less/pull/228
> 
> I hope this is the right mailing list for this.

Yes, it is.

Posting patches to tech@ is not required if, for a specific patch,
you know a better destination.  But tech@ is appropriate for all
kinds of patches, even documentation, in particular when you are
unsure where to send them.

Yours,
  Ingo


> Index: less.1
> ===
> RCS file: /cvs/src/usr.bin/less/less.1,v
> retrieving revision 1.58
> diff -u -p -u -r1.58 less.1
> --- less.1  23 Sep 2021 18:46:25 -  1.58
> +++ less.1  10 Dec 2021 14:58:46 -
> @@ -852,7 +852,7 @@ If a right curly bracket appears in the
>  the } command will go to the matching left curly bracket.
>  The matching left curly bracket is positioned on the top
>  line of the screen.
> -If there is more than one right curly bracket on the top line,
> +If there is more than one right curly bracket on the bottom line,
>  a number N may be used to specify the N-th bracket on the line.
>  .It Ic \&(
>  Like {, but applies to parentheses rather than curly brackets.



Re: Explicitly tag commands in vi(1)

2021-11-21 Thread Ingo Schwarze
Hi Simon,

Simon Branch wrote on Sat, Nov 20, 2021 at 03:10:22PM -0800:

> Here's a diff that adds explicit .Tg macros to vi(1),

We don't want that:

   $ man -O tag=Tg mdoc
  [...]
   In most cases, adding a Tg macro would be redundant because mandoc(1)
   is able to automatically tag most definitions.  This macro is intended
   for cases where automatic tagging of a term is unsatisfactory, for
   example if a definition is not tagged automatically (false negative)
   or if places are tagged that do not define the term (false positives).
  [...]

This is a typical example of a .Tg macro that should not be added
because it is redundant:

> +.Tg c
>  .It Fl c Ar cmd

The command

   $ man -O tag=c vi

already works without your patch.

I do not think .Tg should be used to enforce ambiguous tags (like "c"
pointing to both the command line option and the vi(1) "change" command).
On the one hand, adding multiple explicit .Tg macros for the same name
is very noisy.  On the other hand, it is much less useful than having
an unambiguous tag because both "-O tag=c" for terminal output and the
fragment identifier "#c" in HTML output only target the first copy.
Consequently, the cost-to-benefit ratio is bad for ambiguous manual
tagging.

In general, if the same term is defined at multiple places, or if -
like in this case - multiple different terms represented by the same
word are defined in the same file, tagging does not work very well yet.
The basic concept isn't fully worked out for such complicated cases,
and i'm not even convinced designing a solution for that task that
is simple enough to be worth implementing is possible.  As long as
the basic design is an unsolved problem, trying to handle individual
cases by using the .Tg macro is a bad idea.

> so that you can jump to vi or ex commands using -Otag or :t.
> This patch *should* include every command, but I couldn't figure out
> how to tag the '!' and ':!' commands; none of these worked:
> 
>.Tg
>.Tg !
>.Tg !\&
>.Tg "!"

man(1) already writes the "!" tag even without your patch:

   $ grep ^! /tmp/man.*
  /tmp/man.JF8ZO3DibX:! /tmp/man.sZVbqIr51P 729

which alreday targets this paragraph:

  ! argument(s)
  [range] ! argument(s)
 Execute a shell command, or filter lines through a shell command.

But less(1) does not support that, see /usr/src/usr.bin/less/tags.c
line 217:

   /*
* Search the tags file for the desired tag.
*/
   while (fgets(tline, sizeof (tline), f) != NULL) {
if (tline[0] == '!')
/* Skip header of extended format. */
continue;

The ctags(1) format is not perfect.  Some strings exist that it cannot
tag, and strings starting with an exclamation mark are an example of
such strings.  Strings containing space characters are another.

> This patch doesn't tag special keys either (word erase, literal next,
> etc.), because tag names can't contain whitespace and I'd prefer
> consistency with the manpage.  One solution would be to say WERASE and
> LNEXT instead, which is what ksh(1) does, and this also makes it easier
> to lookup what 'word erase' and 'literal next' really are.
> 
>$ man -k Dv=WERASE Dv=LNEXT
>stty(1) - set the options for a terminal device interface
>termios(4) - general terminal line discipline
> 
>$ man 4 termios -Otag=WERASE
>WERASESpecial character on input and is recognized if the ICANON
>  flag is set.  Erases the last word ...

Maybe; but one thing at a time.

We really should not discuss text changes in a thread about tagging.

> If a line is explicitly tagged with the .Tg macro, mandoc removes any
> other automatically-created tags with the same name.  So this patch also
> explicitly tags the 'print', 'number', and 'list' options and the
> [-ceFRrSstvw] flags.
> 
> Comments?  OK?

Not OK.  This is excessive.  The .Tg macro is intended to be used
sparingly, not to be smeared all over the place.

I do not object to marking unambiguos targets that cannot be recognized
automatically.  But you cannot reasonably aim for completeness in this
respect.

Yours,
  Ingo



Re: diff: improve legibility of structs in several manpages

2021-11-20 Thread Ingo Schwarze
Hi Jan,

sorry that i failed to look at this earlier.

Even with your diff, the style is still not completely consistent.

I think that inside .Bd -literal, the best style is exactly the same
as in source code:

structname{
typename;
type*name;
};

where

 *  can be more than one tab
   if some of the types in the struct are long
 * and  can be two spaces if the struct contains a double pointer
 * and  can be reduced to four spaces if space is tight,
   or a different number of spaces in very unusual cases

Even with your diff, some of the structs still indent the type
using eight spaces instead of a tab and some structs still use
multiple spaces instead of .

That said, i think your diff is an improvement and you should commit it
(OK schwarze@), except that i disagree with your change to ktrace(2).
That one ought to use two tabs for  instead of one tab plus eight
spaces and it should use one tab after "struct timespec" instead of
one space.

On Tue, Oct 26, 2021 at 06:56:10PM +0200, Jan Klemkow wrote:

> This diff harmonises the indentation of struct members and comments in
> several manpages.  Also fixes line wraps of comments on 80 column
> terminals.  General uses tabs for general indentation and 4 spaces on
> tight spots.  Also uses extra space to align pointers and non-pointers
> as we do this on certain places in our source.

Yes, i agree with all of that.

Yours,
  Ingo



Re: locale in expr(1)

2021-11-15 Thread Ingo Schwarze
Hi Marc,

Marc Espie wrote on Mon, Nov 15, 2021 at 05:06:23PM +0100:
> On Mon, Nov 15, 2021 at 03:43:47PM +0100, Jan Stary wrote:
>> On Nov 10 18:46:08, h...@stare.cz wrote:
>>> On Nov 10 18:15:44, h...@stare.cz wrote:

 expr(1) says
 
 expr1 {=, >, >=, <, <=, !=} expr2
 
 Returns the results of integer comparison if both arguments
 are decimal integers; otherwise, returns the results of
 string comparison using the locale-specific collation

Yikes.  Yes, that is true in general.  Then again, the OpenBSD
C library intentionally supports only one single collation sequence:
POSIX.

 sequence.  The result of each comparison is 1 if the specified
 relation is true, or 0 if the relation is false.
 
 Looking at expr.c, it boils down to strcoll(), which ignores the locale.
 So the statement is technically true, but there isn't really any
 "locale-specific collation sequence".
 
 Would it be simpler to leave the mention of locale completely out?

Yes, probably, and then add a CAVEATS section warning that on other
operating systems, the collation order may vary depending on the
LC_COLLATE environment variable.

 Or state something similar to what sort(1) or strcoll(3) and other
 string-comparing routines say?

>>> For example,
>>> 
>>>  $ expr č '<' d  
>>>  0
>>> 
>>> Which locale-specific collation sequence determined that?
>>> Byte by byte, it's
>>> 
>>>  c48d  U+00010d  č   LATIN SMALL LETTER C HACEK
>>>  64U+64  d   LATIN SMALL LETTER D
>>> 
>>> and I don't think there is anything more to it.

Yes.

>>> (Although in the Czech alphabet, č comes just before d.)

Fair enough, but adding knowledge about that to the C library
would be quite insane.

> I don't know what Ingo et al plans wrt collation sequence are, but
> it is a relatively innocuous part of locales.

I think i said so several times: i talked to bapt@ when he implemented
LC_COLLATE support in the FreeBSD libc.  To put it mildly, he disagreed
with your opinion.  Or, to be clear, he was swearing because he hated
the resulting complexity, saying it was much worse than he had expected.
And i don't think bapt@ is easily scared.

> More specifically, the ISO committee introduced functions specifically
> to deal with collation sequence, and left the original raw C functions
> (strcmp and the like) completely pristine... so what's missing is mostly
> an engine to compile location tables into something fairly reasonable.
> 
> There is way less of a security risk than, say, LC_CTYPE... ;)

I don't know.  A fairly reliable way to create security risks is
complexity.  Apart from the erratic run time behaviour that is likely to
trip up sysadmins - LC_COLLATE can change the collation sequence even
among ASCII characters - collation support looks like a very effective
way of putting excessive complexity right into the C library.

So, let me say it again, i would argue for rejecting it even if somebody
offered a complete patch implementing it.  The C library is just the
wrong place for handling such matters.

Yours,
  Ingo



Re: Typo in ASN1_STRING_length.3

2021-11-14 Thread Ingo Schwarze
Hi Matthias,

Matthias Schmidt wrote on Sun, Nov 14, 2021 at 10:41:40AM +0100:

> in the recent commit to ASN1_STRING_length.3 a typo sneaked in:

Committed.

Thank you for reading such patches and for reporting the error!

Yours,
  Ingo


> diff 96921f4986cebab89014f4b188cc00426e2e11d8 /usr/src
> blob - 9d8cbf83ce75abecbc8faf0bf1fce674ef2a8abf
> file + lib/libcrypto/man/ASN1_STRING_length.3
> --- lib/libcrypto/man/ASN1_STRING_length.3
> +++ lib/libcrypto/man/ASN1_STRING_length.3
> @@ -346,7 +346,7 @@ returns a number of bytes.
>  and
>  .Fn ASN1_STRING_copy
>  return 1 on success or 0 on failure.
> -They fail is memory allocation fails.
> +They fail if memory allocation fails.
>  .Fn ASN1_STRING_set
>  and
>  .Fn ASN1_OCTET_STRING_set



Re: locale in who(1)

2021-11-10 Thread Ingo Schwarze
Hi,

Martijn van Duren wrote on Wed, Nov 10, 2021 at 02:03:51PM +0100:

> I see no reason to keep it.
> OK martijn@ if anyone wants to commit this.

Done.

> On Wed, 2021-11-10 at 13:37 +0100, Jan Stary wrote:

>> Why does who(1) need to setlocale()?

On some systems, setlocale(LC_TIME) influences strftime(3).
But who(1) uses ctime(3) instead, and

  https://pubs.opengroup.org/onlinepubs/9699919799/functions/ctime.html

explicitly mentions that ctime(3) does not support localized date and
time formats.

Anyway, we clearly do not want localization in such a place,
neither now nor in the future.

Yours,
  Ingo


>> Index: who.c
>> ===
>> RCS file: /cvs/src/usr.bin/who/who.c,v
>> retrieving revision 1.30
>> diff -u -p -r1.30 who.c
>> --- who.c12 Jul 2021 15:09:20 -  1.30
>> +++ who.c10 Nov 2021 12:37:05 -
>> @@ -44,7 +44,6 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>  
>>  void  output(struct utmp *);
>>  void  output_labels(void);
>> @@ -71,8 +70,6 @@ main(int argc, char *argv[])
>>  FILE *ufp;
>>  char *t;
>>  int c;
>> -
>> -setlocale(LC_ALL, "");
>>  
>>  if (pledge("stdio unveil rpath getpw", NULL) == -1)
>>  err(1, "pledge");



Re: [patch] httpd static gzip compression

2021-11-05 Thread Ingo Schwarze
Hi Theo,

Theo de Raadt wrote on Thu, Nov 04, 2021 at 08:27:47AM -0600:
> prx  wrote:
>> On 2021/11/04 14:21, prx wrote:

>>> The attached patch add support for static gzip compression.
>>> 
>>> In other words, if a client support gzip compression, when "file" is
>>> requested, httpd will check if "file.gz" is avaiable to serve.

>> This diff doesn't compress "on the fly".
>> It's up to the webmaster to compress files **before** serving them.

> Does any other program work this way?

Yes.  The man(1) program does.  At least on the vast majority of
Linux systems (at least those using the man-db implementation
of man(1)), on FreeBSD, and on DragonFly BSD.

Those systems store most manual pages as gzipped man(7) and mdoc(7)
files, and man(1) decompresses them every time a user wants to look
at one of them.  You say "man ls", and what you get is actually
/usr/share/man/man1/ls.1.gz or something like that.

For man(1), that is next to useless because du -sh /usr/share/man =
42.6M uncompressed.  But it has repeatedly caused bugs in the past.
I would love to remove the feature from mandoc, but even though it is
rarely used in OpenBSD (some ports installed gzipped manuals in the
past, but i think the ports tree has been clean now for quite some
time; you might still need the feature when installing software
or unpacking foreign manual page packages without using ports)
it would be a bad idea to remove it because it is too widely used
elsewhere.  Note that even the old BSD man(1) supported it.

> Where you request one filename, and it gives you another?

You did not ask what web servers do, but we are discussing a patch to
a webserver.  For this reason, let me note in passing that even some
otherwise extremely useful sites get it very wrong the other way round:

 $ ftp https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz
Trying 130.89.148.77...
Requesting https://manpages.debian.org/bullseye/coreutils/ls.1.en.gz
100% |**|  8050   00:00
8050 bytes received in 0.00 seconds (11.74 MB/s)
 $ file ls.1.en.gz
ls.1.en.gz: troff or preprocessor input text
 $ grep -A 1 '^.SH NAME' ls.1.en.gz  
.SH NAME
ls \- list directory contents
 $ gunzip ls.1.en.gz
gunzip: ls.1.en.gz: unrecognized file format

> I have a difficult time understanding why gzip has to sneak it's way
> into everything.
> 
> I always prefer software that does precisely what I expect it to do.

Certainly.

I have no strong opinion whether a webserver qualifies as "everything",
though, nor did i look at the diff.  While all manpages are small in the
real world, some web servers may have to store huge amounts of data that
clients might request, so disk space might occasionally be an issue on
a web server even in 2021.  Also, some websites deliver huge amounts of
data to the client even when the user merely asked for some text (not sure
such sites would consider running OpenBSD httpd(8), but whatever :) - when
browsing the web, bandwidth is still occasionally an issue even in 2021,
even though that is a rather absurd fact.

Yours,
  Ingo



Re: uniq: add HISTORY section

2021-11-02 Thread Ingo Schwarze
Hi Todd,

Todd C. Miller wrote on Tue, Nov 02, 2021 at 08:20:47AM -0600:

> uniq appeared in research Unix version 3.

OK schwarze@.

If - like in this case - the indicated release is the first one in any
system to ever contain the feature, and not just the first one among
the direct ancestors of OpenBSD, i prefer the wording "utility first
appeared in", though.  No big deal, your call.

Yours,
  Ingo


> Index: usr.bin/uniq/uniq.1
> ===
> RCS file: /cvs/src/usr.bin/uniq/uniq.1,v
> retrieving revision 1.21
> diff -u -p -u -r1.21 uniq.1
> --- usr.bin/uniq/uniq.1   23 Dec 2017 00:52:33 -  1.21
> +++ usr.bin/uniq/uniq.1   2 Nov 2021 14:14:19 -
> @@ -159,3 +159,8 @@ The historic
>  and
>  .Fl Ns Ar number
>  options have been deprecated but are still supported in this implementation.
> +.Sh HISTORY
> +A
> +.Nm
> +utility appeared in
> +.At v3 .



Re: uniq(1): ignore newline when comparing lines?

2021-11-02 Thread Ingo Schwarze
Hi,

Todd C. Miller wrote on Tue, Nov 02, 2021 at 08:12:00AM -0600:
> On Mon, 01 Nov 2021 21:04:54 -0500, Scott Cheloha wrote:

>> Yes it would be simpler.  However I didn't want to start changing the
>> input -- which we currently don't do -- without discussing it.
>>
>> The standard says we should "write one copy of each input line on the
>> output." So, if we are being strict, we don't add a newline that isn't
>> there, because that isn't what we read.  Any other interpretation
>> requires handwaving about what an "input line" even is.

> The System V version of uniq actually ignores the last line if it
> doesn't end in a newline.  For example:
> 
> $ printf "bar\nfoo\nfool" | uniq
> bar
> foo
> 
> AIX, Solaris, and HP-UX still exhibit this behavior.  What happens
> is that the gline() function (which reads the line) returns non-zero
> when it hits EOF, discarding any input in that line.  Interestingly,
> it does realloc the line buffer as needed to handle long lines.
> 
> So really, this is a corner case where you can't count on consistent
> behavior among implementations and we just need to do what we think
> is best.  If you prefer we retain the existing behavior wrt a final
> line without a newline that is OK with me.

Just a quick comment without looking at the code:

POSIX says on
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/uniq.html :

  INPUT FILES
The input file shall be a text file.

By definition, a file that does not end in a newline character
is *NOT* a text file.  Consequently, what uniq(1) does in this case
is unspecified.

My personal opinion is that in general, if utilities are specified to
process text files and the terminating newline is missing, it is best to
behave as if the mandatory terminating newline were present.  The reason
why i think so is that forgetting the terminating newline is a popular
user error, and silently assuming a terminating newline causes less
surprise with users than silently discarding the incomplete last line.

Most users are not even aware that a file without a terminating newline
is not a text file, so it definitely is less surprising for the vast
majority.  But even those who are aware of that quirk in the definition
of what a text file is (like myself) are regularly surprised by the
consequences...

Yours,
  Ingo



Re: demystify vport(4) in vport(4) and ifconfig(8)

2021-10-29 Thread Ingo Schwarze
Hi Stuart,

Stuart Henderson wrote on Fri, Oct 29, 2021 at 01:59:38PM +0100:
> On 2021/10/29 14:08, Ingo Schwarze wrote:
>> Stuart Henderson wrote on Fri, Oct 29, 2021 at 10:53:41AM +0100:
>>> On 2021/10/28 23:19, Klemens Nanni wrote:
>>>> On Fri, Oct 29, 2021 at 12:57:54AM +0200, Ingo Schwarze wrote:

>>>>>   MANPAGER=firefox man -T html $(ifconfig -C)

>>>> This doesn't work if firefox is already running as the MANPAGER firefox
>>>> process exits immediately after sending the file/link to the running
>>>> process, which causes mandoc to exit after removing the temporary file,
>>>> by which time firefox fails to open the no longer exiting file.

>>> I use mutt_bgrun for things like this

>> I don't see how that can possibly help.

> Oh, of course you're right - I was confused between my helper
> scripts. The one I actually use for html in mutt looks like this,
> so it's not any user for MANPAGER..
> 
> tmp=`mktemp -d /tmp/mutthtml.X` || exit 1
> mhonarc -rcfile ~/.m2h_rcfile -single > $tmp/mail.html
> (chrome $tmp/mail.html 2>/dev/null; sleep 30; rm -r $tmp) &

Indeed, if i added sleep(30) to the place in question
in usr.bin/mandoc/main.c, i would expect quite an outrage...
A protest rally on my front lawn or something like that...

Then again, the man(1) process that is about to exit could maybe fork
a child before exiting.  The parent exiting right away makes sure that
the user gets back their shell prompt instantly after exiting the pager.
The child process, now being in the background, could sleep a few seconds
before deleting the temporary output files, then exit, too.

This does appear to successfully work around the firefox bug.
With the patch below, "MANPAGER=firefox man -T html ..."
now works reliably for me.

It does seem somewhat ugly, though, so i'm not sure whether i should
commit it.  In particular, fork(2)ing merely to go to the background
seems both cumbersome and expensive - admittedly, your script cited
above does the same, but that doesn't imply i have to like that, right?

Is there a cleaner way to let man(1) go into the background and
let the shell offer a fresh prompt while man(1) still sleeps,
without needing to fork(2)?

Yours,
  Ingo


Index: main.c
===
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.262
diff -u -p -r1.262 main.c
--- main.c  4 Oct 2021 21:28:50 -   1.262
+++ main.c  29 Oct 2021 14:12:08 -
@@ -1203,6 +1203,7 @@ woptions(char *arg, enum mandoc_os *os_e
 static void
 run_pager(struct outstate *outst, char *tag_target)
 {
+   const struct timespec delay = { 5, 0 };
int  signum, status;
pid_tman_pgid, tc_pgid;
pid_tpager_pid, wait_pid;
@@ -1245,13 +1246,16 @@ run_pager(struct outstate *outst, char *
if (wait_pid == -1) {
mandoc_msg(MANDOCERR_WAIT, 0, 0,
"%s", strerror(errno));
-   break;
+   return;
}
if (!WIFSTOPPED(status))
break;
 
signum = WSTOPSIG(status);
}
+   if (fork() > 0)
+   _exit(mandoc_msg_getrc());
+   nanosleep(, NULL);
 }
 
 static pid_t



Re: demystify vport(4) in vport(4) and ifconfig(8)

2021-10-29 Thread Ingo Schwarze
Hi Stuart,

Stuart Henderson wrote on Fri, Oct 29, 2021 at 10:53:41AM +0100:
> On 2021/10/28 23:19, Klemens Nanni wrote:
>> On Fri, Oct 29, 2021 at 12:57:54AM +0200, Ingo Schwarze wrote:

>>>   MANPAGER=firefox man -T html $(ifconfig -C)

>> This doesn't work if firefox is already running as the MANPAGER firefox
>> process exits immediately after sending the file/link to the running
>> process, which causes mandoc to exit after removing the temporary file,
>> by which time firefox fails to open the no longer exiting file.
>> 

> I use mutt_bgrun for things like this

I don't see how that can possibly help.  Accoring to
  https://gist.github.com/tabletick/4129818 ,
the mutt_bgrun program essentially does this:

cp "$file" "$tmpfile"
(
"$viewer" $options "$tmpfile"
rm -f "$tmpfile"
rmdir "$tmpdir"
) &

Assume that $file is the temporary HTML file written by man(1),
$tmpfile is the temporary filename (very poorly) generated
by mutt_bgrun, $viewer is firefox and $options is empty.

Then mutt_bgrun is defeated by the firefox bug that i described
in exactly the same way as that bug defeats man(1) itself.
The following sequence of events will sometimes occur:

 1. man(1) writes $file
 2. man(1) calls mutt_bgrun
 3. mutt_bgrun copies $file to $tmpfile
 4. mutt_bgrun spawns a subshell in the background
 5. mutt_bgrun exits
 6. man(1) deletes $file, which is harmless at this point
because $tmpfile still exists
 7. man(1) exits

 8. the mutt_bgrun subshell calls firefox
 9. That temporary firefox process hands the $tmpfile name
to the main running firefox application via IPC.
 10. the mutt_bgrun subshell deletes "$tmpfile"
 11. the mutt_bgrun subshell exits

Then, much later, because firefox is a monster and a slug:

 12. firefox attempts to open "$tmpfile",
 which was already deleted long before

If i understand correctly, mutt_bgrun is designed to solve
exactly the opposite problem we are having here:  When a viewer
(like, for example, xpdf(1)) blocks the formatter (like, for
example, mutt(1) or man(1)) until the user exits the viewer,
then mutt_bgrun does the job of putting the viewer into the
background such that the formatter can proceed before the user
exits the viewer.

But the problem here is that firefox is already doing too much
in the background (specifically, opening the $file, which it
really ought to do in the foreground before returning), not that
it does too little in the background.

Yours,
  Ingo



Re: demystify vport(4) in vport(4) and ifconfig(8)

2021-10-28 Thread Ingo Schwarze
Hi Klemens,

Klemens Nanni wrote on Thu, Oct 28, 2021 at 11:19:30PM +:
> On Fri, Oct 29, 2021 at 12:57:54AM +0200, Ingo Schwarze wrote:

>>   MANPAGER=firefox man -T html $(ifconfig -C)

> This doesn't work if firefox is already running

It is true that it sometimes works and sometimes fails due to a race
condition caused by firefox, but that is kind of orthogonal to benno@'s
question.

> as the MANPAGER firefox process exits immediately after sending the
> file/link to the running process,

I consider that a bug in firefox.  If a program is instructed to open a
file, and in particular a file in /tmp/, it must not exit before actually
opening the file.

> which causes mandoc to exit after removing the temporary file,
> by which time firefox fails to open the no longer exiting file.

Is there a way how man(1) could work around that braindead behaviour
of firefox?  I don't want to commit the following patch - which delays
program exit of man(1) after the pager exits - because the delay is
annoying when using a reasonable pager like less(1).

If someone can come up with an acceptable idea for a workaround,
or if someone can fix firefox, that will be welcome...

Yours,
  Ingo



Index: main.c
===
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.262
diff -u -p -r1.262 main.c
--- main.c  4 Oct 2021 21:28:50 -   1.262
+++ main.c  29 Oct 2021 00:06:24 -
@@ -1203,6 +1203,7 @@ woptions(char *arg, enum mandoc_os *os_e
 static void
 run_pager(struct outstate *outst, char *tag_target)
 {
+   const struct timespec delay = { 1, 0 };
int  signum, status;
pid_tman_pgid, tc_pgid;
pid_tpager_pid, wait_pid;
@@ -1252,6 +1253,7 @@ run_pager(struct outstate *outst, char *
 
signum = WSTOPSIG(status);
}
+   nanosleep(, NULL);
 }
 
 static pid_t



Re: demystify vport(4) in vport(4) and ifconfig(8)

2021-10-28 Thread Ingo Schwarze
Hi,

Sebastian Benoit wrote on Thu, Oct 28, 2021 at 11:05:54PM +0200:

> We tend to forget that there are other output formats for manpages
> as well.  right now, i can got to https://man.openbsd.org/ifconfig and
> jump from there to all of these manpages. With them removed, i can no
> longer do that.
> 
> Not sure if thats a valid consideration though.

I don't think that's a good reason to keep a long, static list in a
manual page *if* that list keeps creating maintenance headaches and
keeps getting outdated.

If information is naturally available from a command line program but
painful to maintain in manual pages, then i don't think it is reasonable
to expect it to be available from https://man.openbsd.org/, and i think
it is good enough if the manual page explains how to get the information
at the command line.

Besides, choose from:

  for name in $(ifconfig -C); do firefox https://man.openbsd.org/$name.4; done

and

  MANPAGER=firefox man -T html $(ifconfig -C)

All that said, don't forget the web equivalent of semantic apropos(1):

  https://man.openbsd.org/?query=Cd~pseudo-device=1

I admit that the URI syntax of web apropos is exceedingly ugly for
historical reasons.  I really ought to implement something like

  https://man.openbsd.org/apropos/Cd~pseudo-device
  https://man.openbsd.org/a/Cd~pseudo-device
  https://apropos.openbsd.org/Cd~pseudo-device

Another day, another task...
Too many pending tasks already...

Yours,
  Ingo



Re: Missing include in sys/device.h

2021-09-28 Thread Ingo Schwarze
Hi Rafael,

Rafael Sadowski wrote on Tue, Sep 28, 2021 at 08:22:06AM +0200:
> On Mon Sep 27, 2021 at 11:20:55PM -0600, Theo de Raadt wrote:

>> Oh, like how about you try compiling a kernel after your proposed diff?
>> 
>> in userland -
>>size_t comes from sys/types.h
>>or a header file which pulls in sys/types.h, and there should be
>>no further new iteration added
>> 
>> in kernel -
>> size_t comes from either sys/param.h or sys/types.h, depending
>> on which one a file includes
>> 
>> What you are suggesting is completely crazy.  It's akin to "all
>> header files should include all other dependent header files, and
>> hey maybe one day we can have a #include_all directive, eh?
>> 
>> Absolutely no way.

> Totally understandable! I have to get used with the userland/kernel
> src-environment. Thanks for the explanation and sorry for the noise.

By the way: some userland programs (but not all userland programs, and
of course not the C library) use the same rule of thumb that "headers
usually do not include headers".

Some will likely regard the following level of documentation excessive,
and i'm certainly not proposing to do this in more places.  But it does
help me personally to not lose track of what provides and uses what
in a program with a large number of deeply nested data structures and
a considerable number of modules that are in part independent of each
other and in part call the internal APIs of the next-lower layer:

  https://mandoc.bsd.lv/man/mandoc_headers.3.html

Making sure that not everything includes everything also helps to
recognize layering violation design errors early on, usually near
the beginning of the implementation phase of a new feature, sometimes
before any code starts being written at all.

Yours,
  Ingo



Re: Unwind + NSD usage question

2021-09-28 Thread Ingo Schwarze
Hello Paul,

Paul de Weerd wrote on Tue, Sep 28, 2021 at 12:44:07PM +0200:

> 'local-data-ptr:' in unbound.conf(5):
> http://man.openbsd.org/unbound.conf#local~2
> http://man.openbsd.org/unbound.conf#local~3

heh, thank you for *both* of these bug reports,
i'm adding them to the mandoc TODO file for now,
but neither is expected to be hard to fix:

- tag.c, tag_put() should not put ASCII_HYPH into the tag file,
  which happens when the tag contains "-" on the input side
  weerd@ 28 Sep 2021 12:44:07 +0200
  loc *  exist *  algo *  size *  imp ***

- tag.c, tag_put() and callers like man_validate.c, check_tag()
  should not mistake "\-" as a word-ending escape sequence but
  instead translate it to plain "-" in the tag name
  weerd@ 28 Sep 2021 12:44:07 +0200
  loc **  exist *  algo *  size *  imp ***

Yours,
  Ingo



Re: OpenBSD `ftp` port for MSVC (Windows)

2021-09-05 Thread Ingo Schwarze
Hi Samuel,

first, note that this is off-topic on tech@.  If you see a need to
continue the thread, please move it to misc@.

Samuel Marks wrote on Sun, Sep 05, 2021 at 02:10:14PM -0400:

> WiP, basically just started this:
> https://github.com/offscale/libacquire/tree/423b751/libacquire/openbsd_ftp

I feel unsure what you are trying to achieve.  The README.md in that
directory is so terse that i don't really understand what the point is.

> What:
> - My project allows for switching between the HTTPS interface provided
> by your OS, and libcurl's HTTPS interface.

I don't really understand what that is supposed to mean either.

> Why OpenBSD `ftp` port:
> - I don't have a "pure" socket solution, and OpenBSD does not have a C
> interface version of their `ftp` (like, `fetch` from FreeBSD does, for
> example)
> - SLOC is short
> - It's from OpenBSD which gives certain trust for protocol conformity,
> safety, security, lower rates of change, and all-in-all quality

None of that applies to our current implementation of the ftp(1) client.
Yes, like everything in OpenBSD, it did receive some auditing, some
maintenance, and some feature additions helping security in the past.

But all the same, our -current ftp(1) client consists of ancient,
terribly ill-designed, technically totally outdated and obsolete
code.  So much so that developers felt it was time to throw it all
away and write a completely new ftp(1) program from scratch.  One
developer started an effort to do so, and that new program can
already do most of what ftp(1) should, but before it was completely
ready for commit, the effort unfortunately somehow petered out, so
we still have the old shit in the tree.

> - Once the MSVC port is done, the port to other OSs should be
> relatively trivial

Porting OpenBSD software to other systems is often a good idea.
Worthy examples include OpenSSH, OpenBGPD, OpenNTPD, OpenSMTPD,
OpenIKED, LibreSSL, and there are likely many more.

But OpenBSD ftp(1) is most definitely not a worthy target for
porting.

Just look at this utter mess:

 $ cd /usr/src/usr.bin/ftp/
 $ grep jmp *.c
cmds.c: (void)setjmp(jabort);
cmds.c: (void)setjmp(jabort);
cmds.c: (void)setjmp(jabort);
cmds.c:jmp_buf abortprox;
cmds.c: longjmp(abortprox, 1);
cmds.c: if (setjmp(abortprox)) {
fetch.c:static jmp_buf  httpabort;
fetch.c:if (setjmp(httpabort)) {
fetch.c:if (setjmp(httpabort)) {
fetch.c:longjmp(httpabort, 1);
fetch.c:if (setjmp(toplevel)) {
ftp.c:jmp_buf   ptabort;
ftp.c:  longjmp(ptabort, 1);
ftp.c:jmp_buf   sendabort;
ftp.c:  longjmp(sendabort, 1);
ftp.c:  if (setjmp(sendabort)) {
ftp.c:  if (setjmp(sendabort))
ftp.c:jmp_buf   recvabort;
ftp.c:  longjmp(recvabort, 1);
ftp.c:  if (setjmp(recvabort)) {
ftp.c:  if (setjmp(recvabort))
ftp.c:  longjmp(ptabort, 1);
ftp.c:  if (setjmp(ptabort))
ftp.c:jmp_buf forceabort;
ftp.c:  longjmp(forceabort, 1);
ftp.c:  if (cout == NULL || setjmp(forceabort)) {
main.c:jmp_buf  toplevel;
main.c: if (setjmp(toplevel))
main.c: top = setjmp(toplevel) == 0;
main.c: longjmp(toplevel, 1);
small.c:jmp_buf jabort;
small.c:longjmp(jabort, 1);
small.c:longjmp(jabort, 1);
small.c:(void)setjmp(jabort);

Or look at this in small.c:

/* XXX - Signal race. */
/* ARGSUSED */
void
mabort(int signo)
{
int save_errno = errno;

alarmtimer(0);
(void) write(fileno(ttyout), "\n\r", 2);
#ifndef SMALL
if (mflag && fromatty) {
/* XXX signal race, crazy unbelievable stdio misuse */
if (confirm(mname, NULL)) {
errno = save_errno;
longjmp(jabort, 1);
}
}
#endif /* !SMALL */
mflag = 0;
errno = save_errno;
longjmp(jabort, 1);
}

Then look at util.c what confirm() is.  The word "unbelievable" is an
accurate description of the situation.

Or look at intr() in main.c.  On top of calling longjmp(3), that
signal handler also calls alarmtimer() in util.c, which happily
does multiple non-atomic stores into a struct, then calls the
function setitimer(2) which is not async-signal-safe either.

Or the updateprogressmeter() signal handler in util.c.  That one
calls progressmeter() in the same file, which is just as bad as
confirm().

We have a policy to mark broken code with /* XXX */ in the tree when
it is found and whoever discovers the breakage has no time to fix
it right away.  But ftp(1) is so bad that i feel like i don't even
have the time to mark stuff as broken.  No matter where i look,
almost all of it is broken.

I'm usually a great fan of fixing broken code by step-by-step
refactoring, even if it is broken in quite serious ways.  But that
is totally out of the question for OpenBSD ftp(1).  That code is
so thoroughly ill-designed throughout that starting over from scratch
is the only viable option.

> So, want me to finish working on this?

It sounds like a waste of time to me.

> - And 

Re: man sed(1) diff

2021-09-05 Thread Ingo Schwarze
Hello,

i see where Ian's confusion is coming from, even though arguably,
the existing text is accurate.  But it is not a good idea to insert
exceptions as parenthetic remarks in the middle of an enumeration
of steps that is already somewhat long and complicated.

I think it is better to explain the special rules for D processing
in the paragraph describing D (surprise, surprise) rather than in
the middle of the general description of what a "cycle" is.


Consider the following minimal example:

   $ printf 'axb' | sed -n 'y/x/\n/;s/^b/c/;P;D'
   a
   c

The D deletes "a\n" from the pattern space.
After that, the next cycle is entered with "b" in the pattern space,
without copying a new line into the pattern space.


OK?
  Ingo


Index: sed.1
===
RCS file: /cvs/src/usr.bin/sed/sed.1,v
retrieving revision 1.48
diff -u -r1.48 sed.1
--- sed.1   17 Mar 2016 05:27:10 -  1.48
+++ sed.1   5 Sep 2021 11:46:48 -
@@ -140,9 +140,6 @@
 cyclically copies a line of input, not including its terminating newline
 character, into a
 .Em pattern space ,
-(unless there is something left after a
-.Ic D
-function),
 applies all of the commands with addresses that select that pattern space,
 copies the pattern space to the standard output, appending a newline, and
 deletes the pattern space.
@@ -331,7 +328,8 @@
 Delete the pattern space and start the next cycle.
 .It [2addr] Ns Ic D
 Delete the initial segment of the pattern space through the first
-newline character and start the next cycle.
+newline character and start the next cycle without copying the next
+line of input into the pattern space.
 .It [2addr] Ns Ic g
 Replace the contents of the pattern space with the contents of the
 hold space.



Re: timeout: Prettify man page and usage

2021-09-04 Thread Ingo Schwarze
Hi Jason,

Jason McIntyre wrote on Sat, Sep 04, 2021 at 09:47:12PM +0100:

> pretty damning that my ok is on that commit ;)
> i'll try to remember...

Heh.  With the amount of work you are doing - your current commit
count stands at 9113, on average 1.34 per day, during a time of
over eightteen years and seven months (and that does not even include
the many OKs you provided), it would be pretty unreasonable to
expect remembering every single commit...   8-)

Yours,
  Ingo



Re: timeout: Prettify man page and usage

2021-09-04 Thread Ingo Schwarze
Hi Jason,

Jason McIntyre wrote on Fri, Sep 03, 2021 at 02:46:47PM +0100:
> On Fri, Sep 03, 2021 at 03:42:21PM +0200, Ingo Schwarze wrote:
>> Theo de Raadt wrote on Thu, Sep 02, 2021 at 09:57:11AM -0600:

>>> I think we should list shorts, and longs which have no shorts.

>> I agree, and i think we arrived at the same conclusion in the past.
>> 
>> It applies to both usage() and SYNOPSIS, and ideally, both should
>> match, except maybe in very unusual cases.

> sure. but grep tripped me up, and it's always the page i think of with
> long options. why did we leave --context in SYNOPSIS?

I admit the grep(1) case is slightly confusing, but it is explained
here why it is somewhat special:

  https://cvsweb.openbsd.org/src/usr.bin/grep/grep.1#rev1.47

Yours,
  Ingo



Re: mark getsubopt(3) as part of POSIX

2021-09-03 Thread Ingo Schwarze
Hi Theo,

as you sent it, your patch is misleading since our manual page describes
two features that are not required by POSIX.

So i propose the larger patch shown below instead.

While here, clarify what "null-terminated" means.  It matters for
this page in particular because the page talks about both NUL bytes
and NULL pointers in different places.

Also, failing to state that the original input string is modified
feels like a serious omission to me.  By the way, these replacements
are indeed required by POSIX.  The strsep(3) below SEE ALSO provides
a rather weak hint, but i regard that as insufficient.

OK?
  Ingo


Index: getsubopt.3
===
RCS file: /cvs/src/lib/libc/stdlib/getsubopt.3,v
retrieving revision 1.14
diff -u -r1.14 getsubopt.3
--- getsubopt.3 15 Nov 2014 14:41:02 -  1.14
+++ getsubopt.3 3 Sep 2021 14:38:28 -
@@ -55,7 +55,9 @@
 is a pointer to a pointer to the string.
 The argument
 .Fa tokens
-is a pointer to a null-terminated array of pointers to strings.
+is a pointer to a
+.Dv NULL Ns -terminated
+array of pointers to strings.
 .Pp
 The
 .Fn getsubopt
@@ -79,6 +81,11 @@
 .Fa optionp
 will be set to point to the start of the next token in the string,
 or the NUL at the end of the string if no more tokens are present.
+The comma, space, or tab character ending the token just parsed,
+and the equal sign separating name and value if any, are replaced
+with NUL bytes in the original
+.Pf * Fa optionp
+input string.
 The external variable
 .Fa suboptarg
 will be set to point to the start of the current token, or
@@ -138,6 +145,16 @@
 .Sh SEE ALSO
 .Xr getopt 3 ,
 .Xr strsep 3
+.Sh STANDARDS
+The
+.Fn getsubopt
+function conforms to
+.St -p1003.1-2008 .
+.Pp
+Allowing space and tab characters to separate tokens
+and the external variable
+.Va suboptarg
+are extensions to that standard.
 .Sh HISTORY
 The
 .Fn getsubopt



Re: timeout: Prettify man page and usage

2021-09-03 Thread Ingo Schwarze
Hi,

Theo de Raadt wrote on Thu, Sep 02, 2021 at 09:57:11AM -0600:

> I think we should list shorts, and longs which have no shorts.

I agree, and i think we arrived at the same conclusion in the past.

It applies to both usage() and SYNOPSIS, and ideally, both should
match, except maybe in very unusual cases.

Yours,
  Ingo



Re: rm.1: remove extraneous word

2021-09-03 Thread Ingo Schwarze
Hi Jason,

Jason McIntyre wrote on Fri, Sep 03, 2021 at 07:47:19AM +0100:
> On Thu, Sep 02, 2021 at 11:10:54PM +0100, Jason McIntyre wrote:

>> i wonder if it was originally an attempt to not quote posix
>> (or posix attempting to not quote bsd). posix refers to removing
>> "directory entries", which seems more natural.

Quite to the contrary.

A wording "removes the entries" first appeared in AT System III
UNIX (1980).  I first see the specific wording "remove directory
entries" in Version 10 AT UNIX in December 1989 and it was then
also used by Cynthia Livingston on June 11, 1990 when she converted
the page from man(7) to mdoc(7).  So this predates POSIX.2 (1992).
Not sure what XPG 3 said in 1989.

The current wording of the .Nd and the first sentence of the DESCRIPTION
was committed by Keith Bostic on August 12, 1990 with this commit message:

  new version of rm from scratch and the POSIX.2 description

I'm surprised that Keith referred to POSIX.2 in 1990 even though
the original POSIX.2 is usually quoted as "IEEE Std 1003.2-1992".
I know that POSIX.2 drafts were being worked on in 1991, but
apparently some were already in circulation in the summer of 1990.

> on the other hand, the phrase "non-directory type files" is pretty
> awful. posix is clearer i think, sticking to "directory entries
> specified by each file argument".we could also use this: "directory
> entries specified on the command line". but that would feel like
> deliberately avoiding the term "file", which is clear and simple.
> 
> just using "non-directory files" is also weird. i mean, you can very
> much remove directory files.

It is standard practice that the first paragraph of the DESCRIPTION
describes default behaviour, then the option list describes
modifications of this behaviour, in this case that -d and -r also
remove directories.  So regarding the content, there is nothing to fix.

If you think the wording is awful, you could say

   The rm utility attempts to remove the files specified on the
   command line.  By default, specifying a directory is an error.
   If the permissions ...

or something similar.  Since this is about wording, i would say it
is your call.

In any case, i agree with Theo that the .Nd should not be changed.

Yours,
  Ingo



Re: Atomic signal flags for vi(1)

2021-09-02 Thread Ingo Schwarze
Hi Tim,

trondd wrote on Wed, Sep 01, 2021 at 08:46:28PM -0400:
> Ingo Schwarze  wrote:
>> Ingo Schwarze wrote on Wed, Sep 01, 2021 at 04:38:51PM +0200:

>>> Note that the h_hup() and h_term() signal handlers are still unsafe
>>> after this commit because they also set the "killersig" (how fitting!)
>>> field in a global struct.

>> I like it when fixing two bugs only amounts to minus: minus 14 LOC,
>> 1 function, 1 global variable, 2 automatic variables, 1 struct member,
>> 9 pointer dereferences, 1 #define, and minus 1 #undef.
>> 
>> The argument that only one single GS exists applies unchanged.

> Great.  This is the direction I was going to go in with it.  I hadn't
> thought of collapsing h_hup an h_term, though.
> 
> This makes sense as I understand the signal/event handling and a quick
> test through the signals shows no difference in behavior.

Thank you (and martijn@) for reviewing and testing.
This is now committed.

> Should h_term() and cl_sigterm be named something more general to
> indicate that they also handle SIGHUP?  Or is it good enough since it
> includes all the signals that 'term'inate vi?  It's not hard to follow
> as-is. 

That is what i was thinking: why re-paint a bike shed if the existing
paint is already pretty and not yet falling off?

Yours,
  Ingo


>> Index: cl/cl.h
>> ===
>> RCS file: /cvs/src/usr.bin/vi/cl/cl.h,v
>> retrieving revision 1.11
>> diff -u -p -r1.11 cl.h
>> --- cl/cl.h  1 Sep 2021 14:28:15 -   1.11
>> +++ cl/cl.h  1 Sep 2021 17:15:34 -
>> @@ -11,7 +11,6 @@
>>   *  @(#)cl.h10.19 (Berkeley) 9/24/96
>>   */
>>  
>> -extern  volatile sig_atomic_t cl_sighup;
>>  extern  volatile sig_atomic_t cl_sigint;
>>  extern  volatile sig_atomic_t cl_sigterm;
>>  extern  volatile sig_atomic_t cl_sigwinch;
>> @@ -31,7 +30,6 @@ typedef struct _cl_private {
>>  char*rmso, *smso;   /* Inverse video terminal strings. */
>>  char*smcup, *rmcup; /* Terminal start/stop strings. */
>>  
>> -int  killersig; /* Killer signal. */
>>  #define INDX_HUP0
>>  #define INDX_INT1
>>  #define INDX_TERM   2
>> Index: cl/cl_funcs.c
>> ===
>> RCS file: /cvs/src/usr.bin/vi/cl/cl_funcs.c,v
>> retrieving revision 1.21
>> diff -u -p -r1.21 cl_funcs.c
>> --- cl/cl_funcs.c1 Sep 2021 14:28:15 -   1.21
>> +++ cl/cl_funcs.c1 Sep 2021 17:15:34 -
>> @@ -437,7 +437,7 @@ cl_refresh(SCR *sp, int repaint)
>>   * If we received a killer signal, we're done, there's no point
>>   * in refreshing the screen.
>>   */
>> -if (clp->killersig)
>> +if (cl_sigterm)
>>  return (0);
>>  
>>  /*
>> @@ -582,7 +582,7 @@ cl_suspend(SCR *sp, int *allowedp)
>>   * unchanged.  In addition, the terminal has already been reset
>>   * correctly, so leave it alone.
>>   */
>> -if (clp->killersig) {
>> +if (cl_sigterm) {
>>  F_CLR(clp, CL_SCR_EX_INIT | CL_SCR_VI_INIT);
>>  return (0);
>>  }
>> Index: cl/cl_main.c
>> ===
>> RCS file: /cvs/src/usr.bin/vi/cl/cl_main.c,v
>> retrieving revision 1.34
>> diff -u -p -r1.34 cl_main.c
>> --- cl/cl_main.c 1 Sep 2021 14:28:15 -   1.34
>> +++ cl/cl_main.c 1 Sep 2021 17:15:34 -
>> @@ -33,7 +33,6 @@
>>  
>>  GS *__global_list;  /* GLOBAL: List of screens. */
>>  
>> -volatile sig_atomic_t cl_sighup;
>>  volatile sig_atomic_t cl_sigint;
>>  volatile sig_atomic_t cl_sigterm;
>>  volatile sig_atomic_t cl_sigwinch;
>> @@ -120,9 +119,9 @@ main(int argc, char *argv[])
>>  }
>>  
>>  /* If a killer signal arrived, pretend we just got it. */
>> -if (clp->killersig) {
>> -(void)signal(clp->killersig, SIG_DFL);
>> -(void)kill(getpid(), clp->killersig);
>> +if (cl_sigterm) {
>> +(void)signal(cl_sigterm, SIG_DFL);
>> +(void)kill(getpid(), cl_sigterm);
>>  /* NOTREACHED */
>>  }
>>  
>> @@ -215,17 +214,6 @@ term_init(char *ttype)
>>  }
>>  }
>>  
>> -#define GLOBAL_CLP \
>> -CL_PRIVATE *clp = GCLP(__global_list);
>> -static void
>> -h_hup(int signo)
>> -{
>> -GLOBAL

Re: timeout: execvp(2) return is always an error

2021-09-02 Thread Ingo Schwarze
Hi Sebastien,

Sebastien Marie wrote on Thu, Sep 02, 2021 at 09:09:43AM +0200:

> If execvp(2) returns, it is always an error: there is no need to check
> if the return value is -1. Just unconditionally call err(3).
> 
> Comments or OK ?

OK schwarze@
  Ingo


>  timeout: execvp(2) should not return except on error

"should not" is confusing, "does not" would be clearer.

> --- usr.bin/timeout/timeout.c
> +++ usr.bin/timeout/timeout.c
> @@ -165,7 +165,7 @@ main(int argc, char **argv)
>   int ch;
>   unsigned long   i;
>   int foreground = 0, preserve = 0;
> - int error, pstat, status;
> + int pstat, status;
>   int killsig = SIGTERM;
>   pid_t   pgid = 0, pid, cpid = 0;
>   double  first_kill;
> @@ -251,9 +251,8 @@ main(int argc, char **argv)
>   signal(SIGTTIN, SIG_DFL);
>   signal(SIGTTOU, SIG_DFL);
>  
> - error = execvp(argv[0], argv);
> - if (error == -1)
> - err(1, "execvp");
> + execvp(argv[0], argv);
> + err(1, "execvp");
>   }
>  
>   if (pledge("stdio", NULL) == -1)



Re: Atomic signal flags for vi(1)

2021-09-01 Thread Ingo Schwarze
Hi,

Ingo Schwarze wrote on Wed, Sep 01, 2021 at 04:38:51PM +0200:

> Note that the h_hup() and h_term() signal handlers are still unsafe
> after this commit because they also set the "killersig" (how fitting!)
> field in a global struct.

I like it when fixing two bugs only amounts to minus: minus 14 LOC,
1 function, 1 global variable, 2 automatic variables, 1 struct member,
9 pointer dereferences, 1 #define, and minus 1 #undef.

The argument that only one single GS exists applies unchanged.

OK?
  Ingo


Index: cl/cl.h
===
RCS file: /cvs/src/usr.bin/vi/cl/cl.h,v
retrieving revision 1.11
diff -u -p -r1.11 cl.h
--- cl/cl.h 1 Sep 2021 14:28:15 -   1.11
+++ cl/cl.h 1 Sep 2021 17:15:34 -
@@ -11,7 +11,6 @@
  * @(#)cl.h10.19 (Berkeley) 9/24/96
  */
 
-extern volatile sig_atomic_t cl_sighup;
 extern volatile sig_atomic_t cl_sigint;
 extern volatile sig_atomic_t cl_sigterm;
 extern volatile sig_atomic_t cl_sigwinch;
@@ -31,7 +30,6 @@ typedef struct _cl_private {
char*rmso, *smso;   /* Inverse video terminal strings. */
char*smcup, *rmcup; /* Terminal start/stop strings. */
 
-   int  killersig; /* Killer signal. */
 #defineINDX_HUP0
 #defineINDX_INT1
 #defineINDX_TERM   2
Index: cl/cl_funcs.c
===
RCS file: /cvs/src/usr.bin/vi/cl/cl_funcs.c,v
retrieving revision 1.21
diff -u -p -r1.21 cl_funcs.c
--- cl/cl_funcs.c   1 Sep 2021 14:28:15 -   1.21
+++ cl/cl_funcs.c   1 Sep 2021 17:15:34 -
@@ -437,7 +437,7 @@ cl_refresh(SCR *sp, int repaint)
 * If we received a killer signal, we're done, there's no point
 * in refreshing the screen.
 */
-   if (clp->killersig)
+   if (cl_sigterm)
return (0);
 
/*
@@ -582,7 +582,7 @@ cl_suspend(SCR *sp, int *allowedp)
 * unchanged.  In addition, the terminal has already been reset
 * correctly, so leave it alone.
 */
-   if (clp->killersig) {
+   if (cl_sigterm) {
F_CLR(clp, CL_SCR_EX_INIT | CL_SCR_VI_INIT);
return (0);
}
Index: cl/cl_main.c
===
RCS file: /cvs/src/usr.bin/vi/cl/cl_main.c,v
retrieving revision 1.34
diff -u -p -r1.34 cl_main.c
--- cl/cl_main.c1 Sep 2021 14:28:15 -   1.34
+++ cl/cl_main.c1 Sep 2021 17:15:34 -
@@ -33,7 +33,6 @@
 
 GS *__global_list; /* GLOBAL: List of screens. */
 
-volatile sig_atomic_t cl_sighup;
 volatile sig_atomic_t cl_sigint;
 volatile sig_atomic_t cl_sigterm;
 volatile sig_atomic_t cl_sigwinch;
@@ -120,9 +119,9 @@ main(int argc, char *argv[])
}
 
/* If a killer signal arrived, pretend we just got it. */
-   if (clp->killersig) {
-   (void)signal(clp->killersig, SIG_DFL);
-   (void)kill(getpid(), clp->killersig);
+   if (cl_sigterm) {
+   (void)signal(cl_sigterm, SIG_DFL);
+   (void)kill(getpid(), cl_sigterm);
/* NOTREACHED */
}
 
@@ -215,17 +214,6 @@ term_init(char *ttype)
}
 }
 
-#defineGLOBAL_CLP \
-   CL_PRIVATE *clp = GCLP(__global_list);
-static void
-h_hup(int signo)
-{
-   GLOBAL_CLP;
-
-   cl_sighup = 1;
-   clp->killersig = SIGHUP;
-}
-
 static void
 h_int(int signo)
 {
@@ -235,10 +223,7 @@ h_int(int signo)
 static void
 h_term(int signo)
 {
-   GLOBAL_CLP;
-
-   cl_sigterm = 1;
-   clp->killersig = SIGTERM;
+   cl_sigterm = signo;
 }
 
 static void
@@ -246,7 +231,6 @@ h_winch(int signo)
 {
cl_sigwinch = 1;
 }
-#undef GLOBAL_CLP
 
 /*
  * sig_init --
@@ -261,20 +245,19 @@ sig_init(GS *gp, SCR *sp)
 
clp = GCLP(gp);
 
-   cl_sighup = 0;
cl_sigint = 0;
cl_sigterm = 0;
cl_sigwinch = 0;
 
if (sp == NULL) {
-   if (setsig(SIGHUP, >oact[INDX_HUP], h_hup) ||
+   if (setsig(SIGHUP, >oact[INDX_HUP], h_term) ||
setsig(SIGINT, >oact[INDX_INT], h_int) ||
setsig(SIGTERM, >oact[INDX_TERM], h_term) ||
setsig(SIGWINCH, >oact[INDX_WINCH], h_winch)
)
err(1, NULL);
} else
-   if (setsig(SIGHUP, NULL, h_hup) ||
+   if (setsig(SIGHUP, NULL, h_term) ||
setsig(SIGINT, NULL, h_int) ||
setsig(SIGTERM, NULL, h_term) ||
setsig(SIGWINCH, NULL, h_winch)
Index: cl/cl_read.c
===
RCS file: /cvs/src/usr.bin/vi/cl/cl_read.c,v
retrieving revision 1.22
diff -u -p -r1.22 cl_read.c
--- cl/cl_read.c1 Sep 2021 14:28:15 -   1.22
+++ cl/cl_r

Re: Atomic signal flags for vi(1)

2021-09-01 Thread Ingo Schwarze
Hi Tim,

trondd wrote on Tue, Aug 24, 2021 at 07:45:33PM -0400:
> "Theo de Raadt"  wrote:
 
>> +h_alrm(int signo)
>> +{
>> +   GLOBAL_CLP;
>> +
>> +   F_SET(clp, CL_SIGALRM);
>> 
>> F_SET is |=, which is not atomic.
>> 
>> This is unsafe.  Safe signal handlers need to make single stores to
>> atomic-sized variables, which tend to be int-sized, easier to declare
>> as sig_atomic_t.  Most often these are global scope with the following
>> type:
>> 
>>  volatile sig_atomic_t variable
>> 
>> I can see you copied an existing practice.  Sadly all the other
>> signal handlers are also broken in the same way.
>> 
>> The flag bits should be replaced with seperate sig_atomic_t variables.

> Ok.  Took a swing at converting the signal handling to sig_atomic_t flags.
> No additional changes added other than removing a redundant check of the
> flags in cl_read.c.  Seemed to be a pretty straight-forward conversion and
> I haven't found any change in behavior.

Committed, thank you.
  Ingo


CVSROOT:/cvs
Module name:src
Changes by: schwa...@cvs.openbsd.org2021/09/01 08:28:15

Modified files:
usr.bin/vi/cl  : cl.h cl_funcs.c cl_main.c cl_read.c 

Log message:
As a first step towards safe signal handling, improve the h_int()
and h_winch() signal handlers to make one single store to a
sig_atomic_t variable.  Note that the h_hup() and h_term() signal
handlers are still unsafe after this commit because they also set
the "killersig" (how fitting!) field in a global struct.

Despite storing information in static global variables rather than
in structs passed around as arguments, this patch does not cause a
change in behaviour because there is always exactly one GS object,
initialized using gs_init() called from the top of main(), and
screen_init() stores a pointer to this one and only GS object in
the .gp member of each and every SCR object.  Talk about useless
abstraction...

Problem pointed out by deraadt@.
Patch from Tim  on tech@.
OK deraadt@.


> Index: cl/cl.h
> ===
> RCS file: /cvs/src/usr.bin/vi/cl/cl.h,v
> retrieving revision 1.10
> diff -u -p -r1.10 cl.h
> --- cl/cl.h   27 May 2016 09:18:11 -  1.10
> +++ cl/cl.h   24 Aug 2021 23:34:27 -
> @@ -11,6 +11,11 @@
>   *   @(#)cl.h10.19 (Berkeley) 9/24/96
>   */
>  
> +extern   volatile sig_atomic_t cl_sighup;
> +extern   volatile sig_atomic_t cl_sigint;
> +extern   volatile sig_atomic_t cl_sigterm;
> +extern   volatile sig_atomic_t cl_sigwinch;
> +
>  typedef struct _cl_private {
>   CHAR_T   ibuf[256]; /* Input keys. */
>  
> @@ -45,11 +50,7 @@ typedef struct _cl_private {
>  #define  CL_RENAME_OK0x0004  /* User wants the windows renamed. */
>  #define  CL_SCR_EX_INIT  0x0008  /* Ex screen initialized. */
>  #define  CL_SCR_VI_INIT  0x0010  /* Vi screen initialized. */
> -#define  CL_SIGHUP   0x0020  /* SIGHUP arrived. */
> -#define  CL_SIGINT   0x0040  /* SIGINT arrived. */
> -#define  CL_SIGTERM  0x0080  /* SIGTERM arrived. */
> -#define  CL_SIGWINCH 0x0100  /* SIGWINCH arrived. */
> -#define  CL_STDIN_TTY0x0200  /* Talking to a terminal. */
> +#define  CL_STDIN_TTY0x0020  /* Talking to a terminal. */
>   u_int32_t flags;
>  } CL_PRIVATE;
>  
> Index: cl/cl_funcs.c
> ===
> RCS file: /cvs/src/usr.bin/vi/cl/cl_funcs.c,v
> retrieving revision 1.20
> diff -u -p -r1.20 cl_funcs.c
> --- cl/cl_funcs.c 27 May 2016 09:18:11 -  1.20
> +++ cl/cl_funcs.c 24 Aug 2021 23:34:27 -
> @@ -601,7 +601,7 @@ cl_suspend(SCR *sp, int *allowedp)
>   if (cl_ssize(sp, 1, NULL, NULL, ))
>   return (1);
>   if (changed)
> - F_SET(CLP(sp), CL_SIGWINCH);
> + cl_sigwinch = 1;
>  
>   return (0);
>  }
> Index: cl/cl_main.c
> ===
> RCS file: /cvs/src/usr.bin/vi/cl/cl_main.c,v
> retrieving revision 1.33
> diff -u -p -r1.33 cl_main.c
> --- cl/cl_main.c  5 May 2016 20:36:41 -   1.33
> +++ cl/cl_main.c  24 Aug 2021 23:34:27 -
> @@ -33,6 +33,11 @@
>  
>  GS *__global_list;   /* GLOBAL: List of screens. */
>  
> +volatile sig_atomic_t cl_sighup;
> +volatile sig_atomic_t cl_sigint;
> +volatile sig_atomic_t cl_sigterm;
> +volatile sig_atomic_t cl_sigwinch;
> +
>  static void cl_func_std(GS *);
>  static CL_PRIVATE *cl_init(GS *);
>  static GS  *gs_init(void);
> @@ -217,16 +222,14 @@ h_hup(int signo)
>  {
>   GLOBAL_CLP;
>  
> - F_SET(clp, CL_SIGHUP);
> + cl_sighup = 1;
>   clp->killersig = SIGHUP;
>  }
>  
>  static void
>  h_int(int signo)
>  {
> - GLOBAL_CLP;
> -
> - F_SET(clp, CL_SIGINT);
> + cl_sigint = 1;
>  }
>  
>  static void
> @@ -234,16 +237,14 @@ h_term(int signo)
>  {
>   GLOBAL_CLP;
>  
> - F_SET(clp, 

Re: averse to lisp in base?

2021-08-29 Thread Ingo Schwarze
Hi,

Tomasz Rola wrote on Sun, Aug 29, 2021 at 08:21:03PM +0200:
> On Sun, Aug 29, 2021 at 03:27:27AM +0200, mayur...@kathe.in wrote:

>> Would the core team consider including a minimalist lisp in the base?
>> e.g. http://t3x.org/klisp/index.html

[...]
> If I would want to propose any Lisp into the base system, I would
> point at s9fes (Scheme 9 from Empty Space), also by Nils Holm.

As a general rule of thumb:  if some program

 * is maintained by a person who is not an OpenBSD developer,
 * and/or is maintained in a repository outside of cvs.openbsd.org,
 * and is not required for building or running anything
   contained in the base system,

then it is usually better maintained in ports than in base.

For users, typing "doas pkg_add my_loveliest_lisp" once is negligible
effort, so it really doesn't matter for them.

For developers, regularly synching a piece of third-party software
into base is quite tedious and wastes considerable working time,
whereas keeping a port up to date causes orders of magnitude less
work.

For example, synching to base is a royal pain for essential stuff
like clang and Perl and drm.  It causes non-trivial work even for
smaller stuff like unbound and nsd.  In several caaes, it was such
a nightmare that developers just gave up in those cases - for
example, just think of nginx and sqlite.

In any case, this thread is wildly off-topic on tech@.  If you have
additional questions related to this topic, please start a new
thread on misc@, or post a port or a port update to ports@.

Yours,
  Ingo



Re: Atomic signal flags for vi(1)

2021-08-29 Thread Ingo Schwarze
Hi Theo,

Theo de Raadt wrote on Sun, Aug 29, 2021 at 11:38:18AM -0600:
> Ingo Schwarze wrote:

>> *If* more than one GS object ever existed and/or the .gp pointers
>> in different SCR objects could point to different GS objects, this
>> patch might change behaviour.

> If such multiple GS condition ever existed, since signals are (global),
> the handler is only indicating a signal has happened.  And the code
> observing non-zero of these signal-indicating variables would be
> responsible to determine which event it applies to.

Absolutely, that is the sane way to code such stuff.

However, we have all seen signal handlers making complicated decisions
by navigating complex, non-trivial data structures, even in our own
tree.  Of course, those need to be fixed.  But before changing the
code, i still need to figure out whether the seemingly non-trivial
data structure is indeed non-trivial, and which higher-level code
must be made smarter to preserve the functionality, or whether the
apparent complexity merely hides the fact that the data is actually
trivial, as i think it is in this case.

> It is too hard to do this kind of work safely/atomically in a
> signal handler.

100% correct.  And yet, people do try to, and then it even works.
Well, kind of.  Most of the time.  Until it suddenly doesn't.  And
when it suddenly doesn't, people are used to blaming cosmic muons
(and yes, there is indeed about one of those per square centimeter
per minute, on average) rather than thinking something might be
wrong with their code.

Yours,
  Ingo



Re: Atomic signal flags for vi(1)

2021-08-29 Thread Ingo Schwarze
Hi,

Theo de Raadt wrote on Sun, Aug 29, 2021 at 02:54:57AM -0600:

> This does look better.
> 
> I appreciate that you are fixing this underlying problem first, before
> overlaying your timer diff.

Indeed.

> Is this working for the vi crowd?

*If* more than one GS object ever existed and/or the .gp pointers
in different SCR objects could point to different GS objects, this
patch might change behaviour.  So the hardest part about veryfying
it is to make sure that only one single GS object exists at any
given time and all .gp pointers in all SCR objects always point to
that one GS object.  GS and [.>]gs are used at hundreds of places,
but i only managed to find a single place where GS is instantiated
(gs_init() in cl/cl_main.c, which is called inly once at the very
beginning of main()), and only a single place where the .gp member
of SCR is ever changed, in screen_init() in common/screen.c, which
is called from a handful of places, but always with a pointer to
that one particular GS object.

So yes, i do think changing members of GS to static globals is safe
and testing with a single screen is sufficient.  After the patch,
 * SIGHUP prints "Hangup" and exits as expected;
 * SIGINT prints "Interrupted" in the status line and exits insert
   mode, but stays in vi(1);
 * SIGTERM prints "Terminated" and exits as expected;
 * window size changes work as expected both in the forground and
   in the background while suspended with Ctrl-Z, and sending
   SIGWINCH manually has no effect, as expected.

Note that the patch is incomplete.  The signal handler functions
h_hup() and h_term() still aren't safe because they still write to
GCLP(__global_list)->killersig.  But this patch performs one
well-defined step and is hard enough to audit already, so let's get
it committed, then take care of killersig in a second step.

So if someone wants to commit, this patch is OK schwarze@.

Alternatively, provide an OK and tell me to commit.

Be careful that the original author of this patch signed as "Tim"
but is not tim@ for all i can tell.

Yours,
  Ingo


> trondd  wrote:
> 
> > "Theo de Raadt"  wrote:
> > 
> > > +h_alrm(int signo)
> > > +{
> > > +   GLOBAL_CLP;
> > > +
> > > +   F_SET(clp, CL_SIGALRM);
> > > 
> > > F_SET is |=, which is not atomic.
> > > 
> > > This is unsafe.  Safe signal handlers need to make single stores to
> > > atomic-sized variables, which tend to be int-sized, easier to declare
> > > as sig_atomic_t.  Most often these are global scope with the following
> > > type:
> > > 
> > >   volatile sig_atomic_t variable
> > > 
> > > I can see you copied an existing practice.  Sadly all the other
> > > signal handlers are also broken in the same way.
> > > 
> > > The flag bits should be replaced with seperate sig_atomic_t variables.
> > 
> > Ok.  Took a swing at converting the signal handling to sig_atomic_t flags.
> > No additional changes added other than removing a redundant check of the
> > flags in cl_read.c.  Seemed to be a pretty straight-forward conversion and
> > I haven't found any change in behavior.
> > 
> > Tim.
> > 
> > 
> > Index: cl/cl.h
> > ===
> > RCS file: /cvs/src/usr.bin/vi/cl/cl.h,v
> > retrieving revision 1.10
> > diff -u -p -r1.10 cl.h
> > --- cl/cl.h 27 May 2016 09:18:11 -  1.10
> > +++ cl/cl.h 24 Aug 2021 23:34:27 -
> > @@ -11,6 +11,11 @@
> >   * @(#)cl.h10.19 (Berkeley) 9/24/96
> >   */
> >  
> > +extern volatile sig_atomic_t cl_sighup;
> > +extern volatile sig_atomic_t cl_sigint;
> > +extern volatile sig_atomic_t cl_sigterm;
> > +extern volatile sig_atomic_t cl_sigwinch;
> > +
> >  typedef struct _cl_private {
> > CHAR_T   ibuf[256]; /* Input keys. */
> >  
> > @@ -45,11 +50,7 @@ typedef struct _cl_private {
> >  #defineCL_RENAME_OK0x0004  /* User wants the windows renamed. */
> >  #defineCL_SCR_EX_INIT  0x0008  /* Ex screen initialized. */
> >  #defineCL_SCR_VI_INIT  0x0010  /* Vi screen initialized. */
> > -#defineCL_SIGHUP   0x0020  /* SIGHUP arrived. */
> > -#defineCL_SIGINT   0x0040  /* SIGINT arrived. */
> > -#defineCL_SIGTERM  0x0080  /* SIGTERM arrived. */
> > -#defineCL_SIGWINCH 0x0100  /* SIGWINCH arrived. */
> > -#defineCL_STDIN_TTY0x0200  /* Talking to a terminal. */
> > +#defineCL_STDIN_TTY0x0020  /* Talking to a terminal. */
> > u_int32_t flags;
> >  } CL_PRIVATE;
> >  
> > Index: cl/cl_funcs.c
> > ===
> > RCS file: /cvs/src/usr.bin/vi/cl/cl_funcs.c,v
> > retrieving revision 1.20
> > diff -u -p -r1.20 cl_funcs.c
> > --- cl/cl_funcs.c   27 May 2016 09:18:11 -  1.20
> > +++ cl/cl_funcs.c   24 Aug 2021 23:34:27 -
> > @@ -601,7 +601,7 @@ cl_suspend(SCR *sp, int *allowedp)
> > if (cl_ssize(sp, 1, NULL, NULL, ))
> > return (1);
> > if (changed)
> > -   F_SET(CLP(sp), CL_SIGWINCH);
> > +  

Re: allow KARL with config(8)'d kernels

2021-08-29 Thread Ingo Schwarze
Hi,

Theo de Raadt wrote on Sun, Aug 29, 2021 at 07:15:34AM -0600:

> I am not thrilled with the name "kernel.conf".
> It does not seem intuitively discoverable.

What would be a canonical name?

It is a command file for config(8).

Note that the "config-file" for config is something else, and that
other thing is also called a "kernel configuration file", and it
is located in places like /sys/arch/luna88k/conf/GENERIC iiuc.

For command files, we normally use the name of the programming
language as a suffix (*.sh, *.pl, *.py, *.awk, *.sed, ...).  But
here, the domain specific language is really config(8), and *.config
does not work as a suffix because it is next to indistinguishable
from *.conf, and we just saw this is *not* a configuration but a
command file.

The root cause of the terminologic mess here is that the choice of
the name config(8) for the kernel configuration program was too
generic in the first place, but we are almost exactly 40 years late
for fixing that.

One - admittedly completely unUNIXy - way would be to invent a long,
descriptive name like /etc/kernel.config.commands or even /bsd.config.cmd
in the root rather than the /etc directory, which is more discoverable
because it is right next to the kernel itself.  The UNIXy way
is not necessarily better: Ken would have invented a very short,
pronouncable name that is not an English word but similar to one,
like /etc/sycoc (for SYstem COnfiguration Commands).

Is /bsd.config.cmd good enough?

Yours,
  Ingo

> Paul de Weerd  wrote:

>> Index: distrib/miniroot/install.sub
[...]
>>  chroot /mnt /bin/ksh -e -c "cd ${_kernel_dir#/mnt}/$_kernel; \
>> -make newbsd; make newinstall"
>> +make newbsd; \
>> +[ -e /etc/kernel.conf ] && \
>> +config -e -c /etc/kernel.conf -f bsd; \
>> +make newinstall"
>>  ) >/dev/null 2>&1 && echo " done." || echo " failed."



Re: cal(1): Clean up mutually exclusive options

2021-08-16 Thread Ingo Schwarze
Hi Theo,

Theo de Raadt wrote on Mon, Aug 16, 2021 at 08:15:03AM -0600:
> Jason McIntyre  wrote:

>> well, in those cases i think the authors shared the viewpoint that
>> mutually exclusive means you can;t mix them but in this case it is
>> essentially not an error to do so, and so documented it that way.
>> 
>> maybe it is more helpful to think of "mutually exclusive" as "causes
>> an error", but i am not sure.

Probably you are right that my description of "mutually exclusive"
as "it errors out" was too narrow and that in some situations,
understanding it in a broader sense of "the effect of combining
them is unspecified" or even broader, "combining them provides no
useful functionality" may be adequate.

Still, "mutually exclusive" is not the same concept as "one overrides
the other".

> I agree with you that programs should not neccessary error out.
> 
> I appreciate that you pointed at ls(1) before, which has so many options
> I think we should look there for examples.
> 
>  -A  List all entries except for `.' and `..'.  Always set for the
>  superuser.
> 
>  -a  Include directory entries whose names begin with a dot (`.').
> 
> These two options are not described as mutually exclusive, except, a
> mutually exclusive action is described.
> 
> How do -Aa and -aA behave?  Try it.

Just like -a alone.  That is obviously reasonable because both
options enlarge the set of names listed, and the set provided by
-a is a strict superset of the set provided by -A.

> Does POSIX take a position?

Not as far as i see.

POSIX does explicitly document them as mutually exclusive by
having [-a | -A] in the SYNOPSIS, but what happens when both are
provided appears to remain unspecified.

> If POSIX doesn't take a position, why should we?

We probably shouldn't.  Since behaviour of -aA is unspecied, our
behviour is not only reasonable but also conforming.  Strictly
speaking, it is undocumented, but easy to guess and certainly
not surprising.

> Should we break scripts that accidentally use both options?

Absolutely not.

Even if our page explicitly said "-a and -A are mutually exclusive",
that should only be regarded as an indication that the combination
*might* error out, not that it actually does - i failed to make
that clear in my previous mail.

> Argument composition does happen in scripting and interactive use.
> 
> Should we search for places to make POSIX programs exit(1) if people
> use them in the newly-defined incorrect way?

No.

But conversely, we shouldn't attempt to hunt down all cases either
where programs currently error out when given mutually exlusive
options and make them all override each other.  In many cases, it
is better to not jump to conclusions when the user didn't make it
clear what they wanted.

Unsurprisingly, sane behaviour depends on the individual program
and options.

> If we decide to not cause ls(1) -a and -A to exit, then I think we
> should not make other programs exit either.

Sounds reasonable.

> And then probably follow jmc's lead that "mutually exclusive" either
> in text, or in interpreted text, doesn't neccessarily describe an exit
> action.

That's certainly the current situation.  In some pages, it is used for
combinations that error out, in some for options that override each
other, and when left unqualified, it can mean "the effect of combining
is not considered valid syntax and may have an unspecified effect".

The latter is often fine (like for ls -aA) and more rarely weird
(like for jot -b foo -w bar 3 97).  When it is weird, improving it
may be worthwhile.

Yours,
  Ingo



Re: cal(1): Clean up mutually exclusive options

2021-08-16 Thread Ingo Schwarze
Hi Jason,

Jason McIntyre wrote on Mon, Aug 16, 2021 at 12:02:13PM +0100:

> when i wrote my mail, i failed to understand that "overrides earlier"
> was really just another way of saying "mutually exclusive".

That is incorrect.

This is what "mutually exclusive" means (without further qualification):

   $ frob -f -u
  usage: frob [-f | -u]

This is what "overrides earlier" means:

   $ frob -f -u
  output from successful unfrobnification

   $ frob -u -f
  output from successful frobnification

> i don;t find it as clear, and i don;t hugely like it, but i guess
> it's just my preference.
> 
> i also dislike the sentence structure of "Overrides earlier -f."
> although it's understandable, it reads like there's a word or phrase
> missing.

Yes, it is an elliptic wording.

> Overrides earlier instances of -f. Overrides previous -f options.

Both are still elliptic.

The intended meaning of the ellipsis is:

  [This option] overrides [any] earlier -f [options].

Should we prefer the non-elliptic wording?

> i just find it clearer when we just say -x and -y are mutually exclusive.

OK, let's spell this out in full.

I'm aware of two ways to document "mutually exclusive":

  -f   Frobnicate.  Mutually exclusive with -u.
  -u   Unfrobnicate.  Mutually exclusive with -f.

Or:

  -f   Frobnicate.
  -u   Unfrobnicate.
  [...]
  The options -f and -u are mutually exclusive.

I have no strong preference either way because if a user specifies both,
they get the usage() message, so the error is likely to be discovered
either way.


I'm aware of two ways to document "overrides":

  -f   Frobnicate.  Overrides earlier -u.
  -u   Unfrobnicate.  Overrides earlier -f.

Or:

  -f   Frobnicate.
  -u   Unfrobnicate.
  [...]
  The options -f and -u override each other [and the action
  is determined by the last one specified].

I prefer the the former because the latter is easy to miss (in
addition to being longer), and there is no error message to tell
the user that *something* is going on.


Some pages use

  The options -f and -u are mutually exclusive and override each other.

as a synonym for

  The options -f and -u override each other.

That may have caused your confusion.  Arguably, it might be better
to avoid the somewhat redundant and potentially confusing wording
"are mutually exclusive and override each other".  Do you agree?

Yours,
  Ingo



Re: cal(1): Clean up mutually exclusive options

2021-08-16 Thread Ingo Schwarze
Hi Theo,

Theo de Raadt wrote on Sun, Aug 15, 2021 at 05:42:13PM -0600:

> Is it really reasonable that we should strictly follow a non-applicable
> standard in such rarely used non-standard utilities,

Not strictly, no.  Usefulness needs to be considered in individual
cases.

There is value in not *gratuitiously* violating POSIX utility syntax
guidelines though, for two reasons:

 1. They make sense independently of the fact that they are in POSIX.

 2. Following this (small) set of (relatively unintrusive) guidelines
helps with designing a harmonious user experience all across
the system.  On the one hand, it helps avoiding surprises.
On the other hand, it saves the user from having to learn
completely new ways of constructing syntax for every new command.

But this is unrelated to martijn@'s cal(1) diff.  POSIX provides
no arguments for or against that diff as far as i'm aware.

Yours,
  Ingo



Re: cal(1): Clean up mutually exclusive options

2021-08-16 Thread Ingo Schwarze
Hi Jason,

Jason McIntyre wrote on Sun, Aug 15, 2021 at 11:30:05PM +0100:

> can't we take a stance that where options override each other,
> the last one wins?

Yes, that is possible.

Cases exist where one option overrides another and order does not
matter - for example, "lock -n -t -1" is the same as just "lock -n",
even though -t comes after -n.

But if options override *each other*, that means the last one wins
unless otherwise stated.

A utility can also be designed such that some options override each
other and the first one wins, but usually, i wouldn't consider that
ideal UI design in a command line utility.  In configuration files,
on the other hand, both "first one wins" and "last one wins" exist.
For that reason, nothing much seems wrong with being explicit that
the last one wins, where that is the case.

> and then not document this fact every time. or at least document
> exceptions only?
> 
> ...and continue to document where options are mutually exclusive? 
> 
> also the text "overrides earlier" is unclear.

In which sense do you consider

  -f   Frobnicate.  Overrides earlier -u.
  -l   [...]
  -n   [...]
  -u   Unfrobnicate.  Overrides earlier -f.

unclear?  Options exclusively appear on the command line, so i don't
see how "earlier" or "later" could possibly refer to anything else
than the position on the command line.

Note that this seems only loosely related to martijn@'s diff, which
i think is undesirable for other, stronger reasons.

Yours,
  Ingo



Re: cal(1): Clean up mutually exclusive options

2021-08-16 Thread Ingo Schwarze
Hi Martijn,

Martijn van Duren wrote on Sun, Aug 15, 2021 at 11:40:49PM +0200:
> To quote schwarze in the jot mutually exclusive thread:
> On Fri, 2021-08-13 at 11:48 +0200, Ingo Schwarze wrote:

>> In this case, even though this is not a POSIX command, POSIX utility
>> convention 12.2.11 is pertinent:
>> 
>>   The order of different options relative to one another should not
>>   matter, unless the options are documented as mutually-exclusive
>>   and such an option is documented to override any incompatible
>>   options preceding it.  If an option that has option-arguments is
>>   repeated, the option and option-argument combinations should be
>>   interpreted in the order specified on the command line.

> This is also violated by cal(1)

You seem to misunderstand.

The cal(1) utility does *not* violate this syntax guideline.
It neither has options where the order on the command line matters
nor options that take arguments, so this guideline simply makes
no recommendation whatsoever for cal(1).

> (and maybe others, but this one came
> up first). Diff below should fix this.

I think the diff is not OK.

It changes established and documented behaviour of a program, and
you provide no explanation why the new behaviour would be more
useful than the old one.  Also, considering whether this might or
might not cause compatibility issues would would be required.

Compatibility issues may be unlikely because it merely defines new
behaviour for something that is now a syntax error.  That alone does
not make the change useful, though.

The change itself seems illogical and confusing to me.  If a user
says "i want days numbered consecutively from Jan 1 *and* week
numbers displayed in addition", why should the first half of the
request be ignored?  And worse, *silently* ignored?  Both halfs of
the request are unrelated to each other, so both one overriding
the other and their effect depending on their order would seem
surprising and hardly useful.

Also, allowing additional syntax makes the user interface more
complex, which should not be done without a very good reason.

Finally, even *if* we would decide to make the UI more complex and
allow -jw and -wj (i see no pressing need because -j does not seem
all that useful in the first place, and needing both at the same
time would be an even less common task), then the logical behaviour
would be to simply honour both at the the same time.  So your change
not only makes the UI more complex without a good reason.  It also
blocks syntax space that could, in the future, possibly be assigned
to more useful functionality.

I admit there are no strong logical reasons why these two options
have to mutually exclusive.  That's seems merely an implementation
choice to keep the command and documentation simpler.  Nothing is
wrong with having some (not usually needed) functionality unimplemented
in a program.  In this case, this is even properly documented.

To summarize, i oppose the diff as a whole, and i see not parts in
it that could be salvaged.

Yours,
  Ingo


> Index: cal.1
> ===
> RCS file: /cvs/src/usr.bin/cal/cal.1,v
> retrieving revision 1.31
> diff -u -p -r1.31 cal.1
> --- cal.1 27 Nov 2016 10:37:22 -  1.31
> +++ cal.1 15 Aug 2021 21:39:28 -
> @@ -53,11 +53,8 @@ The options are as follows:
>  .Bl -tag -width Ds
>  .It Fl j
>  Display Julian dates (days one-based, numbered from January 1).
> -The options
> -.Fl j
> -and
> -.Fl w
> -are mutually exclusive.
> +Overrides earlier
> +.Fl w .
>  .It Fl m
>  Display weeks starting on Monday instead of Sunday.
>  .It Fl w
> @@ -65,11 +62,8 @@ Display week numbers in the month displa
>  If
>  .Fl m
>  is specified the ISO week format is assumed.
> -The options
> -.Fl j
> -and
> -.Fl w
> -are mutually exclusive.
> +Overrides earlier
> +.Fl j .
>  .It Fl y
>  Display a calendar for the current year.
>  .El
> Index: cal.c
> ===
> RCS file: /cvs/src/usr.bin/cal/cal.c,v
> retrieving revision 1.30
> diff -u -p -r1.30 cal.c
> --- cal.c 9 Oct 2015 01:37:06 -   1.30
> +++ cal.c 15 Aug 2021 21:39:28 -
> @@ -158,12 +158,14 @@ main(int argc, char *argv[])
>   switch(ch) {
>   case 'j':
>   julian = 1;
> + wflag = 0;
>   break;
>   case 'm':
>   mflag = 1;
>   break;
>   case 'w':
>   wflag = 1;
> + julian = 0;
>   break;
>   case 'y':
>   yflag = 1;
> @@ -174,9 +176,6 @@ main(int argc, char *argv[])
>   }
>   argc -= optind;
>   argv += optind;
> -
> - if (julian && wflag)
> - usage();
>  
>   day_headings = DAY_HEADINGS_S;
>   sep1752 = sep1752s;



Re: fresh prompt after Ctrl-C in cdio(1)

2021-08-13 Thread Ingo Schwarze
Hi,

Christian Weisgerber wrote on Thu, Aug 12, 2021 at 06:31:56PM +0200:
> Ingo Schwarze:

>> deraadt@ recently pointed out to me in private mail that it is good
>> for usability if interactive programs providing line editing
>> functionality are consistent what they do with Ctrl-C, ideally
>> discard the current input line and provide a fresh prompt.
>> 
>> So i propose to do the same in cdio(1), which currently just exits
>> on Ctrl-C.
>> 
>> OK?

> That looks correct and works fine for me.  ok naddy@

Committed for now, thanks for checking.

> (I still have a USB CD drive and I use it from time to time to play
> audio CDs with cdio.  I had completely forgotten that cdio supports
> editline, though.  The cdio interactive mode is pretty useless since
> interrupting a running command aborts the program.)

Any user of this program with a sufficient knowledge of signal
handling is welcome to design, implement, and test signal handling
that remains active during other parts of the execution of the
cdio(1) program.  The committed patch provides a starting point.

Yours,
  Ingo



Re: jot(1): make -b -c and -w mutually exclusive

2021-08-13 Thread Ingo Schwarze
Hi Martijn,

Martijn van Duren wrote on Fri, Aug 13, 2021 at 08:51:42AM +0200:

> So here's the first one:
> - When -b is set, but followed by -w it doesn't remove the boring flag
>   and the -w is interpreted as a -b string

I agree that can't be quite right.

> - -c is defined as "This is an abbreviation for -w %c.", but adding a
>   -w option itself results in:
>   $ jot -cw"bla " 5 97
>   bla a
>   bla b
>   bla c
>   bla d
>   bla e
> 
>   instead of
> 
>   $ jot -w"bla " 5 97 
>   bla 97
>   bla 98
>   bla 99
>   bla 100
>   bla 101

I agree that cannot be right either.

It was even listed in my personal BUGS.txt file for the utility.

> - -b always overwrites -c.

That isnt necessarily wrong, ...

> tb already agrees that these options should be mutually exlusive, so
> here's the accompanying code. I choose to go for the last option wins
> appraoch, similar to ls, df, du, ...

 ... but i agree "mutually exclusive, overriding" is potentially
more useful.

In this case, even though this is not a POSIX command, POSIX utility
convention 12.2.11 is pertinent:

  The order of different options relative to one another should not
  matter, unless the options are documented as mutually-exclusive
  and such an option is documented to override any incompatible
  options preceding it.  If an option that has option-arguments is
  repeated, the option and option-argument combinations should be
  interpreted in the order specified on the command line.

In its current state, the program violates that.  With your patch,
it conforms.

> Regress seems to pass.
> 
> OK?

OK schwarze@.

Consider putting the declaration of "bool word" alongside the other
"static bool" in file scope.  I admit this one is only used in main,
but so are finalnl, infinity, and randomize, and physicists like
symmetry.

I would prefer to keep the documentation of overriding together
with each option, making it harder to overlook and even reducing
the total amount of text.

That is, append

 * "Overrides earlier -b, -c, and -w." to the -b and -w list entries
 * "Overrides earlier -b and -w." to the -c list entry.

OK either way, though.

Yours,
  Ingo


> Index: jot.c
> ===
> RCS file: /cvs/src/usr.bin/jot/jot.c,v
> retrieving revision 1.51
> diff -u -p -r1.51 jot.c
> --- jot.c 30 Jul 2021 02:47:37 -  1.51
> +++ jot.c 13 Aug 2021 06:50:42 -
> @@ -86,6 +86,7 @@ main(int argc, char *argv[])
>   int n = 0;
>   int ch;
>   const char  *errstr;
> + boolword;
>  
>   if (pledge("stdio", NULL) == -1)
>   err(1, "pledge");
> @@ -94,10 +95,13 @@ main(int argc, char *argv[])
>   switch (ch) {
>   case 'b':
>   boring = true;
> + chardata = word = false;
>   format = optarg;
>   break;
>   case 'c':
>   chardata = true;
> + boring = word = false;
> + format = "";
>   break;
>   case 'n':
>   finalnl = false;
> @@ -115,6 +119,8 @@ main(int argc, char *argv[])
>   sepstring = optarg;
>   break;
>   case 'w':
> + word = true;
> + boring = chardata = false;
>   format = optarg;
>   break;
>   default:



Re: fresh prompt after Ctrl-C in cdio(1)

2021-08-12 Thread Ingo Schwarze
Hi Martijn,

Martijn van Duren wrote on Thu, Aug 12, 2021 at 04:37:24PM +0200:

> Maybe I've used cdio once or twice and I don't have a cd-player at hand
> (at least connected to my workstation) to test this. So purely from code
> inspection: You set the signal handler before entering el_gets and you
> ignore it right after.

More precisely, i restore default signal handling right after, which
is "terminate process" for SIGINT.  In most situations, i dislike
ignoring SIGINT.

> Wouldn't this imply that if you do something like "cdplay" at the prompt
> and while you wait for the cd to spin you hit ^C the application just
> exits?

Yes.

> If so, wouldn't it make more sense to install the signal handler once
> and let open_cd() handle EINTR as well and just return to the prompt?

Maybe.  The question whether cdio(1) should allow Ctrl-C to be used to
interrupt more operations, and not just command line editing, and return
to the interactive prompt in that case is certainly legitimate.

Then again, my point is to unify line editing across different
programs, not to consider overall signal handling in cdio(1).
As long as we only touch line editing in cdio(1), that goal can be
reached and the risk of regressions is easier to control.  Extending
signal handling over larger swaths of code requires reviewing more
code and making sure what Ctrl-C does is sane in all code paths in
all of that code.

For example, looking at the function run(), i see that most commands
start by calling open_cd(), but not all do.  A few do not call it at
all (e.g. CMD_SET, CMD_HELP), in which case we would have to make sure
that the signal handler does not remain installed longer than intended,
a few do some other work before calling it (e.g. CMD_DEVICE).

So i would prefer to leave the broader question of signal handling
in cdio(1), outside the specific domain of command line editing,
to people who are more familiar with the code and usage of cdio(1).

If somebody wants to develop, test and propose a patch with a wider
scope, for example a global signal handler that can interupt any
operation - such a patch would have to make sure cleanup is done
as required after all operations that might get aborted -, i'm not
opposed to that, but it don't feel like developing such a patch
myself at this point.

Yours,
  Ingo



fresh prompt after Ctrl-C in cdio(1)

2021-08-12 Thread Ingo Schwarze
Hi,

deraadt@ recently pointed out to me in private mail that it is good
for usability if interactive programs providing line editing
functionality are consistent what they do with Ctrl-C, ideally
discard the current input line and provide a fresh prompt.
At least that is what bc(1), ftp(1), sftp(1) and all shells do.

So i propose to do the same in cdio(1), which currently just exits
on Ctrl-C.

Sending to tech@ because i'm not quite sure which developer owns cdio(1),
or who might be interested.  According to the CVS logs, naddy@ was the
last one to change user-visible functionality, and before that, there
was nothing but bug fixes and cleanup since 2010.

OK?
  Ingo

P.S.
Note that the current "!siz" is fragile; el_gets(3) sets it to -1
on error.  I'm polishing that while here.

P.P.S.
buf = (char *)el_gets(...) is ugly and potentially dangerous, the
manual page explicitly warns to not do that, but that's a job for
another day.

P.P.P.S.
Insects are the most successful class of animals on earth.
Even in OpenBSD, it is rarely possible to look at code without
finding something unrelated that could be improved, too.


Index: cdio.c
===
RCS file: /cvs/src/usr.bin/cdio/cdio.c,v
retrieving revision 1.80
diff -u -p -r1.80 cdio.c
--- cdio.c  18 Jan 2021 00:44:00 -  1.80
+++ cdio.c  12 Aug 2021 13:36:38 -
@@ -63,6 +63,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -158,6 +159,7 @@ int verbose = 1;
 intmsf = 1;
 const char *cddb_host;
 char   **track_names;
+volatile sig_atomic_t signo;
 
 EditLine   *el = NULL; /* line-editing structure */
 History*hist = NULL;   /* line-editing history */
@@ -179,6 +181,7 @@ int pstatus(char *arg);
 intplay_next(char *arg);
 intplay_prev(char *arg);
 intplay_same(char *arg);
+void   sigint_handler(int);
 char   *input(int *);
 char   *prompt(void);
 void   prtrack(struct cd_toc_entry *e, int lastflag, char *name);
@@ -1499,18 +1502,36 @@ status(int *trk, int *min, int *sec, int
return s.data->header.audio_status;
 }
 
+void
+sigint_handler(int signo_arg)
+{
+   signo = signo_arg;
+}
+
 char *
 input(int *cmd)
 {
+   struct sigaction sa;
char *buf;
int siz = 0;
char *p;
HistEvent hev;
 
+   memset(, 0, sizeof(sa));
do {
-   if ((buf = (char *) el_gets(el, )) == NULL || !siz) {
-   *cmd = CMD_QUIT;
+   signo = 0;
+   sa.sa_handler = sigint_handler;
+   if (sigaction(SIGINT, , NULL) == -1)
+   err(1, "sigaction");
+   buf = (char *)el_gets(el, );
+   sa.sa_handler = SIG_DFL;
+   if (sigaction(SIGINT, , NULL) == -1)
+   err(1, "sigaction");
+   if (buf == NULL || siz <= 0) {
fprintf(stderr, "\r\n");
+   if (siz < 0 && errno == EINTR && signo == SIGINT)
+   continue;
+   *cmd = CMD_QUIT;
return (0);
}
if (strlen(buf) > 1)



libedit: stop playing hopeless games with FIONBIO

2021-08-11 Thread Ingo Schwarze
Hi,

in its current state, the editline(3) library is completely unfit
to work with non-blocking I/O.  Neither the API nor the implementation
are even designed for that purpose.

I do have some candidate ideas to minimally extend the API - without
adding any new functions - to also work with non-blocking I/O, but
frankly, i don't see any sane use case for it yet.  Why would anyone
desire to mix non-blocking I/O and interactive line editing with the
editline(3) library on the same file descriptor?

Consequently, i propose that we document the facts up front and
simply delete some code that isn't going to do any good in practice.
For programs not using non-blocking I/O on the editline input file
descriptor, this patch makes no difference.  Programs that do set
FIONBIO on a file descriptor and then call editline(3) on it anyway
can't possibly work as intended as far as i can see.  Either setting
FIONBIO is needless in the first place, or el_gets(3) clearing it
will ruin the rest of the program's functionality.

Do any of you have any doubts, and if so, which ones?

Or is anybody willing to OK this patch?


If soembody comes up with a sane use case for mixing FIONBIO with
editline(3) at a later point in time, we can still consider my plans
to make that work.  In that case, deleting the junk deleted by this
patch would be required as the first step anyway.


I'm appending a simple test program at the end.

Before the patch:

  nbio is on
  ?test
  the user said: test
  nbio is off/* oops, what is this? */

After the patch:

  nbio is on
  ?progname: el_gets: Resource temporarily unavailable

I think the latter makes more sense and is less surprising.

Yours,
  Ingo


Index: editline.3
===
RCS file: /cvs/src/lib/libedit/editline.3,v
retrieving revision 1.46
diff -u -p -r1.46 editline.3
--- editline.3  22 May 2016 22:08:42 -  1.46
+++ editline.3  11 Aug 2021 16:20:00 -
@@ -169,6 +169,20 @@ Programs should be linked with
 .Pp
 The
 .Nm
+library is designed to work with blocking I/O only.
+If the
+.Dv FIONBIO
+.Xr ioctl 2
+is set on
+.Ar fin ,
+.Fn el_gets
+and
+.Fn el_wgets
+will almost certainly fail with
+.Ar EAGAIN .
+.Pp
+The
+.Nm
 library respects the
 .Ev LC_CTYPE
 locale set by the application program and never uses
Index: read.c
===
RCS file: /cvs/src/lib/libedit/read.c,v
retrieving revision 1.47
diff -u -p -r1.47 read.c
--- read.c  11 Aug 2021 15:13:46 -  1.47
+++ read.c  11 Aug 2021 16:20:00 -
@@ -66,7 +66,6 @@ struct el_read_t {
int  read_errno;
 };
 
-static int read__fixio(int, int);
 static int read_char(EditLine *, wchar_t *);
 static int read_getcmd(EditLine *, el_action_t *, wchar_t *);
 static voidread_clearmacros(struct macros *);
@@ -132,26 +131,6 @@ el_read_getfn(struct el_read_t *el_read)
 }
 
 
-/* read__fixio():
- * Try to recover from a read error
- */
-static int
-read__fixio(int fd, int e)
-{
-   int zero = 0;
-
-   switch (e) {
-   case EAGAIN:
-   if (ioctl(fd, FIONBIO, ) == -1)
-   return -1;
-   return 0;
-
-   default:
-   return -1;
-   }
-}
-
-
 /* el_push():
  * Push a macro
  */
@@ -231,15 +210,12 @@ static int
 read_char(EditLine *el, wchar_t *cp)
 {
ssize_t num_read;
-   int tried = 0;
char cbuf[MB_LEN_MAX];
int cbp = 0;
-   int save_errno = errno;
 
  again:
el->el_signal->sig_no = 0;
while ((num_read = read(el->el_infd, cbuf + cbp, 1)) == -1) {
-   int e = errno;
if (errno == EINTR) {
switch (el->el_signal->sig_no) {
case SIGCONT:
@@ -252,14 +228,8 @@ read_char(EditLine *el, wchar_t *cp)
break;
}
}
-   if (!tried && read__fixio(el->el_infd, e) == 0) {
-   errno = save_errno;
-   tried = 1;
-   } else {
-   errno = e;
-   *cp = L'\0';
-   return -1;
-   }
+   *cp = L'\0';
+   return -1;
}
 
/* Test for EOF */


 - 8< - schnipp - >8 - 8< - schnapp - >8 -


#include 
#include 
#include 
#include 
#include 

int
main(void)
{
EditLine*el;
const char  *line;
int  len;
int  flags = O_NONBLOCK;

if (fcntl(STDIN_FILENO, F_SETFL, ) == -1)
err(1, "fcntl(F_SETFL)");
flags = fcntl(STDIN_FILENO, F_GETFL);
printf("nbio is %s\n", flags & O_NONBLOCK ? "on" : "off");
if ((el = el_init("test", stdin, stdout, stderr)) == NULL)
err(1, "el_init");
if ((line = el_gets(el, )) == 

Re: date -j and seconds since the Epoch

2021-08-11 Thread Ingo Schwarze
Hi,

> ok gerhard@

Thanks for reporting, for the initial patch, and for checking the
final one.  This now committed.

Yours,
  Ingo


>> Index: date.c
>> ===
>> RCS file: /cvs/src/bin/date/date.c,v
>> retrieving revision 1.56
>> diff -u -p -r1.56 date.c
>> --- date.c   8 Aug 2019 02:17:51 -   1.56
>> +++ date.c   6 Aug 2021 17:48:01 -
>> @@ -219,7 +219,11 @@ setthetime(char *p, const char *pformat)
>>  }
>>   
>>  /* convert broken-down time to UTC clock time */
>> -if ((tval = mktime(lt)) == -1)
>> +if (pformat != NULL && strstr(pformat, "%s") != NULL)
>> +tval = timegm(lt);
>> +else
>> +tval = mktime(lt);
>> +if (tval == -1)
>>  errx(1, "specified date is outside allowed range");
>>   
>>  if (jflag)



Re: libedit: stop ignoring SIGINT

2021-08-09 Thread Ingo Schwarze
Hi Martijn,

Martijn van Duren wrote on Mon, Aug 09, 2021 at 02:15:42PM +0200:

> If we're stripping down fixio to a single function, why not
> inline the whole thing?

I deliberately tried to:

 1. Keep patches that change behaviour as small as possible to make
review as simple as possible for people who fear they might be
affected but are not specifically interested in editline(3).

 2. Make sure that reorg / cleanup patches do not change behaviour.

> Also, as far as my brain explains things to me, it could theoretically
> be possible that the signal handler kicks in right after we return a -1
> from read(2), making it possible to get something like an EIO and
> entering the sig switch statement. Assuming that a signal handler
> doesn't clobber our errno (which libedit doesn't do, but who knows what
> bad code is out there) we should probably only check the signal type
> when we know we have an EINTR.

Yes, that looks like a race condition bug that so far escaped my
attention.  But it seems unrelated to what should happen with signals
in general, so can we keep fixing that bug separate?

> Finally, I don't understand why we only have a single retry on EAGAIN?

Not having *any* retries inside read_char() would look like a worthy
long-term goal to me, but i'm not yet completely sure that can be
reached.  I would prefer to steer into the direction of fewer magic
retries rather than more of them.  Either way, EAGAIN seems unrelated
to SIGINT, so i'd prefer to keep topics separate.

> If the application keeps resetting the FIONBIO in such a furious way
> that it happens more then once in a single call (no idea how that could
> happen) it might be an uphill battle, but we just burn away cpu cycles.
> It is not and should probably not be treated like something fatal.

Actually, if the application sets FIONBIO at all (even once), chances
are quite low in the first place that stuff works as inteded by the
author of the application.  So i certainly wouldn't worry about an
application setting FIONBIO in a loop, we have significantly worse
problems than that.  But again, that's a separate topic.

> Diff below does this. (minus 27 LoC)

I do not in general object to cleaning this code up, and getting rid
of read__fixio() indeed seems to be a long-term goal.  But i hope
we will be able to remove all the (mostly broken) functionality
from read__fixio() in a step-by-step manner, and once the function
is empty, it will fade away without having to disturb the code in
read_char().  Either way - separate topic...

The most important short-term goal seems to fix sftp(1), including
the steps required to get that done in a clean way.

> The following ports use libedit (there might be a better way of
> finding them, but this works):

Hmm, thanks, that list feels useful.

> So if we decide which of our interpretations should take precedence
> it might be a good idea to put it into snaps for a while.

I don't think so in this case.  Let's not over-use the feature of
putting stuff in snaps.  I think that should be reserved for stuff
that is quite important and somewhat urgent and can't easily be
tested in a less disruptive way.  But here, testing a program is
quite feasible once you know which program to test.

Besides, *if* this patch causes a change in behaviour of a port,
the most likely change is that a program that now ignores SIGINT
exits on SIGINT afterwards.  That may be worth investigating and
making a decision on in each individual case, but it's not a
super-critical change in behaviour that might require testing
in snaps.

Yours,
  Ingo



libedit: stop ignoring SIGINT

2021-08-09 Thread Ingo Schwarze
Hi,

as mentioned earlier, deraadt@ reported that sftp(1) ignores Ctrl-C.
Fixing that without longjmp(3) requires making editline(3) better
behaved.

Currently, when read(2) from the terminal gets interrupted by a
signal, editline(3) ignores the (first) signal and unconditionally
calls read(2) a second time.  That seems wrong: if the user hits
Ctrl-C, it is sane to assume that they meant it, not that they
want it ignored.

The following patch causes el_gets(3) and el_wgets(3) to return
failure when read(2)ing from the terminal is interrupted by a
signal other than SIGCONT or SIGWINCH.  That allows the calling
program to decide what to do, usually either exit the program or
provide a fresh prompt to the user.

If i know how to grep(1), the following programs in the base system
use -ledit:

 * bc(1)
   It behaves well with the patch: Ctrl-C discards the current
   input line and starts a new input line.
   The reason why this already works even without the patch
   is that bc(1) does very scary stuff inside the signal handler:
   pass a file-global EditLine pointer on the heap to el_line(3)
   and access fields inside the returned struct.  Needless to
   say that no signal handler should do such things...

 * cdio(1)
   Behaviour is acceptable and unchanged with the patch:
   Ctrl-C exits cdio(1).

 * ftp(1)
   It behaves well with the patch: Ctrl-C discards the current
   input line and provides a fresh prompt.
   The reason why it already works without the patch is that ftp(1)
   uses setjmp(3)/longjmp(3) to forcefully grab back control
   from el_gets(3) without waiting for it to return.

 * lldb(1)
   It misbehaves with or without the patch and ignores Ctrl-C.
   I freely admit that i don't feel too enthusiastic about
   debugging that beast.

 * sftp(1)
   Behaviour is improved with the patch: Ctrl-C now exits sftp(1).
   If desired, i can supply a very simple follow-up patch to sftp.c
   to instead behave like ftp(1) and bc(1), i.e. discard the
   current input line and provide a fresh prompt.
   Neither doing undue work in the signal handler nor longjmp(3)
   will be required for that (if this patch gets committed).

 * bgplgsh(8), fsdb(8)
   I have no idea how to test those.  Does anyone think that testing
   either of them would be required?

Regarding the patch below, note that differentiating EINTR behaviour
by signal number is not needed because the calling code in read_char()
already handles SIGCONT and SIGWINCH, so those two never arrive in
read__fixio() in the first place.

Also note that deraadt@ pointed out in private mail to me that the
fact that read__fixio() clears FIONBIO is probably a symptom of
botched editline(3) API design.  That might be worth fixing, too,
as far as that is feasible, but it is unrelated to the sftp(1)
Ctrl-C issue; let's address one topic at a time.

Any feedback is welcome.

Yours,
  Ingo

P.S.
I decided to Cc: tech@ again because this patch might affect
even people who are not specifically interested in editline(3),
and i have no intention to cause unpleasant surprises.


Index: read.c
===
RCS file: /cvs/src/lib/libedit/read.c,v
retrieving revision 1.45
diff -u -U15 -r1.45 read.c
--- read.c  9 Aug 2021 09:11:26 -   1.45
+++ read.c  9 Aug 2021 10:00:18 -
@@ -134,22 +134,19 @@
 
 /* read__fixio():
  * Try to recover from a read error
  */
 static int
 read__fixio(int fd, int e)
 {
int zero = 0;
 
switch (e) {
case EAGAIN:
if (ioctl(fd, FIONBIO, ) == -1)
return -1;
return 0;
 
-   case EINTR:
-   return 0;
-
default:
return -1;
}
 }



Re: libedit: read__fixio cleanup

2021-08-09 Thread Ingo Schwarze
Hi Martijn,

Martijn van Duren wrote on Mon, Aug 09, 2021 at 11:04:35AM +0200:

> Btw, would there be any benefit to declare zero as const in this
> context?

I dont think so.

At least according to the ioctl(2) manual page, FIONBIO expects
an int * argument, not a const int *.

Yours,
  Ingo



libedit: read__fixio cleanup

2021-08-08 Thread Ingo Schwarze
Hi,

deraadt@ recently reported to me that the editline(3) library, while
line editing is active - for example inside el_gets(3) - ignores
the first SIGINT it receives, for example the the first Ctrl-C the
user presses.  I consider that a bug in the editline(3) library.
Some programs, for example our old ftp(1) implementation, work
around the bug in a horrible way by using setjmp(3)/longjmp(3).

The root cause of the problem is in libedit/read.c, in the interaction
of the functions read_char() and read__fixio().  Before fixing the bug
can reasonably be considered, the function read__fixio() direly needs
cleanup.  As it stands, it is utterly unreadable.

So here is a patch to make it clear what the function does, with
no functional change intended yet (-37 +5 LOC).

There will be one or more follow-up patches.  If you want to receive
them, please reply to me, i won't necessarily send them all to
tech@.

I see some value in avoiding gratuitious divergence from NetBSD,
but getting rid of this horrible mess is not gratuitious.

Rationale:
 * Do not mark an argument as unused that is actually used.
 * errno cannot possibly be -1.  Even is it were, treating it as
   EAGAIN makes no sense, treating it as the most severe error
   imaginable makes more sense to me.
 * We define EWOULDBLOCK to be the same as EAGAIN, so no need
   to handle it separately.
 * No need to #define TRY_AGAIN to use it just once.
 * Don't do the same thing twice.  We do support the FIONBIO ioctl(2),
   so the the indirection using the F_GETFL fcntl(2) can be deleted.
 * No need to play confusing games with the "e" variable.
   Just return -1 for error or 0 for success in a straightforward
   manner.

OK?
  Ingo

P.S.
I also considered whether this FIONBIO dance should better be done
at the initialization stage rather than after EAGAIN already happened.
But maybe not.  This is a library.  The application program might
set the fd to non-blocking mode at any time and then call el_gets(3)
again, in which case the library needs to restore blocking I/O to
do its work.


Index: read.c
===
RCS file: /cvs/src/lib/libedit/read.c,v
retrieving revision 1.44
diff -u -p -r1.44 read.c
--- read.c  25 May 2016 09:36:21 -  1.44
+++ read.c  8 Aug 2021 10:30:06 -
@@ -39,9 +39,10 @@
  * read.c: Clean this junk up! This is horrible code.
  *Terminal read functions
  */
+#include 
+
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -134,55 +135,16 @@ el_read_getfn(struct el_read_t *el_read)
 /* read__fixio():
  * Try to recover from a read error
  */
-/* ARGSUSED */
 static int
-read__fixio(int fd __attribute__((__unused__)), int e)
+read__fixio(int fd, int e)
 {
+   int zero = 0;
 
switch (e) {
-   case -1:/* Make sure that the code is reachable */
-
-#ifdef EWOULDBLOCK
-   case EWOULDBLOCK:
-#ifndef TRY_AGAIN
-#define TRY_AGAIN
-#endif
-#endif /* EWOULDBLOCK */
-
-#if defined(POSIX) && defined(EAGAIN)
-#if defined(EWOULDBLOCK) && EWOULDBLOCK != EAGAIN
case EAGAIN:
-#ifndef TRY_AGAIN
-#define TRY_AGAIN
-#endif
-#endif /* EWOULDBLOCK && EWOULDBLOCK != EAGAIN */
-#endif /* POSIX && EAGAIN */
-
-   e = 0;
-#ifdef TRY_AGAIN
-#if defined(F_SETFL) && defined(O_NDELAY)
-   if ((e = fcntl(fd, F_GETFL)) == -1)
+   if (ioctl(fd, FIONBIO, ) == -1)
return -1;
-
-   if (fcntl(fd, F_SETFL, e & ~O_NDELAY) == -1)
-   return -1;
-   else
-   e = 1;
-#endif /* F_SETFL && O_NDELAY */
-
-#ifdef FIONBIO
-   {
-   int zero = 0;
-
-   if (ioctl(fd, FIONBIO, ) == -1)
-   return -1;
-   else
-   e = 1;
-   }
-#endif /* FIONBIO */
-
-#endif /* TRY_AGAIN */
-   return e ? 0 : -1;
+   return 0;
 
case EINTR:
return 0;



Re: less(1): refreshing file of size 0 results in file being treated as a pipe

2021-08-07 Thread Ingo Schwarze
Hi,

thank you for reporting this bug and for providing a patch to fix it.
I just committed your patch.

Also thanks to tb@ and deraadt@ for cross-checking the patch.

Yours,
  Ingo


user wrote on Thu, Aug 05, 2021 at 12:43:21AM -0500:

> Oops, forgot that OpenBSD doesn't have ! capability in less. Instead of !echo 
> a > % and !echo b > %, run 
> $ echo a > /tmp/test
> Press h and q in less to reload the file
> $ echo b > /tmp/test
> Press h and q in less to reload the file
> 
> On Thu, Aug 05, 2021 at 12:37:00AM -0500, user wrote:
> > Bug Reproduction:
> > $ touch /tmp/test
> > $ less /tmp/test
> > Press r
> > Run !echo a > %
> > Run !echo b > %
> > Output:
> > a
> > b
> > 
> > On Fri, Jul 23, 2021 at 11:15:59AM -0500, user wrote:
> > > Less contains a hack to force files of size 0 to become non-seekable in 
> > > order to workaround a linux kernel bug. 
> > > 
> > > When the file becomes non-seekable any further reads from the file are 
> > > appended rather than overwriting the original contents of the file.
> > > 
> > > diff --git ch.c ch.c
> > > index 1a679767a42..d7c0aa34e90 100644
> > > --- ch.c
> > > +++ ch.c
> > > @@ -643,19 +643,6 @@ ch_flush(void)
> > > ch_block = 0; /* ch_fpos / LBUFSIZE; */
> > > ch_offset = 0; /* ch_fpos % LBUFSIZE; */
> > > 
> > > -#if 1
> > > -   /*
> > > -* This is a kludge to workaround a Linux kernel bug: files in
> > > -* /proc have a size of 0 according to fstat() but have readable
> > > -* data.  They are sometimes, but not always, seekable.
> > > -* Force them to be non-seekable here.
> > > -*/
> > > -   if (ch_fsize == 0) {
> > > -   ch_fsize = -1;
> > > -   ch_flags &= ~CH_CANSEEK;
> > > -   }
> > > -#endif
> > > -
> > > if (lseek(ch_file, (off_t)0, SEEK_SET) == (off_t)-1) {
> > > /*
> > >  * Warning only; even if the seek fails for some reason



Re: date -j and seconds since the Epoch

2021-08-06 Thread Ingo Schwarze
Hi,

sorry for the afterthought, i just noticed that the Subject: line
is misleading.  This patch has nothing to do with -j.  If -j is not
specified, this patch changes the value passed to adjtime(2) or
settimeofday(2), which arguably matters even more than something
merely printed on stdout.

However, the value being passed to adjtime(2) or settimeofday(2)
right now is just as wrong as the one printed, if -f %s is used,
so i think the patch is correct for that case, too.

Yours,
  Ingo



Re: date -j and seconds since the Epoch

2021-08-06 Thread Ingo Schwarze
Hi Gerhard and Bryan,

Gerhard Roth wrote on Mon, Aug 02, 2021 at 10:36:05AM +0200:

> Bryan Vyhmeister found a strange behavior in date(1):
>
>   # date -f %s -j 1627519989
>   Thu Jul 29 01:53:09 PDT 2021
>   # date -u -f %s -j 1627519989
>   Thu Jul 29 00:53:09 UTC 2021
> 
> Looks like PDT is GMT-1, which of course is wrong.
> 
> The problem arises from the -f option. The argument of date(1) is passed
> to strptime(3). Normally, this will return a broken down time in the
> local timezone.

This claim confused me somewhat at first.  I think a more accurate
statement would be that strptime(3) does not use the TZ at all.
It merely parses the string and fills the fields in struct tm.
The question whether the caller had any particular time zone in
mind does not even arise here.

Since date(1) says:

  ENVIRONMENT
 TZ   The time zone to use when parsing or displaying dates.  [...]

the command

   $ date -f %H:%M -j 9:00

is correct to essentially just echo "09:00" back at you
because date(1) is specified to use the same time zone
for parsing and printing.

> But the '%s' format makes an exception and returns a
> date in UTC.

That is indeed true.

So for %s the time zone used for parsing is necessarily different,
while the time zone used for printing is still the $TZ specified
by the user (or the /etc/localtime specified by the sysadmin).

So i think your approach of using timegm(3) for %s and mktime(3)
otherwise is essentially correct.

However, a format string can contain more characters than just 
a single conversion specification.  For example, somebody might
wish to parse an input file containing a line

  SSE=1627519989

and legitimately say

  $ date -f SSE=%s -j $(grep SSE= input_file)

which still yields the wrong result even with your patch.

Even worse, what are the admittedly weird commands

  $ date -f %s:%H -j 1627519989:15
  $ date -f %H:%s -j 15:1627519989
  $ date -f %H:%s:%m -j 15:1627519989:03

supposed to do?  Apparently, data from later conversions is supposed
to override data from earlier ones, so should the last conversion
that is related to the the time zone win, i.e. %s:%H use mktime(3)
but %H:%s and %H:%s:%m gmtime(3)?  I would argue that is excessive
complexity for little benefit, if any.

One might also argue that %s:%H 1627519989:09 is more usefully
interpreted as "9 o'clock UTC on the day containing 1627519989"
than "9 o'clock in the local TZ on the day containing 1627519989".
In addition to using a consistent input time zone, the former also
avoids fun with crossing the date line that the latter might cause.
The former is also easier to implement, see the patch below.

What do people think?

> The patch below isn't very beautiful, but fixes the problem:
> 
>   # date -f %s -j 1627519989
>   Wed Jul 28 17:53:09 PDT 2021

P.S.
Your patch was mangled (one missing blank line and spurious spaces
inserted before tabs on some lines).  Lately, it is becoming annoying
that even very experienced developers no longer seem able to reliably
send email containing patches that actually apply...  :-(


Index: date.c
===
RCS file: /cvs/src/bin/date/date.c,v
retrieving revision 1.56
diff -u -p -r1.56 date.c
--- date.c  8 Aug 2019 02:17:51 -   1.56
+++ date.c  6 Aug 2021 17:48:01 -
@@ -219,7 +219,11 @@ setthetime(char *p, const char *pformat)
}
 
/* convert broken-down time to UTC clock time */
-   if ((tval = mktime(lt)) == -1)
+   if (pformat != NULL && strstr(pformat, "%s") != NULL)
+   tval = timegm(lt);
+   else
+   tval = mktime(lt);
+   if (tval == -1)
errx(1, "specified date is outside allowed range");
 
if (jflag)



Re: less(1): refreshing file of size 0 results in file being treated as a pipe

2021-08-06 Thread Ingo Schwarze
Hi,

after quite some head-scratching, i consider the following bug report
legitimate:

user wrote on Thu, Aug 05, 2021 at 12:43:21AM -0500:
> On Thu, Aug 05, 2021 at 12:37:00AM -0500, user wrote:
>> On Fri, Jul 23, 2021 at 11:15:59AM -0500, user wrote:

>>> Less contains a hack to force files of size 0 to become non-seekable
>>> in order to workaround a linux kernel bug. 

I'm inserting a few words into the next sentence to make it clearer
what it is trying to say:

>>> When the file becomes non-seekable any further reads from the file
>>> are appended

  to the temporary buffer in which less(1) holds the content
  of the file

>>> rather than overwriting the original contents of the file

  in that buffer.

>> Bug Reproduction:

   $ rm -rf /tmp/test

>> $ touch /tmp/test
>> $ less /tmp/test
>> Press r

> $ echo a > /tmp/test# my comment: from a different shell
> Press h and q in less to reload the file
> $ echo b > /tmp/test
> Press h and q in less to reload the file

Now less(1) shows the following on the screen because it thinks
that would be the current content of the file:

>> a
>> b

That is wrong.  Instead, it should show the actual file content,
which is just:

  b

I think the proposed patch makes sense and should be committed:
File size has nothing to do with whether a file is seekable,
so i don't think it can cause regressions.
Also, it fixes the bug described above.

Any developer willing to provide an OK?
Alternatively, commit yourself with OK schwarze@.

I'm attaching the patch again because the OP mangled it (tabs
replaced by spaces) and it did not apply.

Yours,
  Ingo


Index: ch.c
===
RCS file: /cvs/src/usr.bin/less/ch.c,v
retrieving revision 1.21
diff -u -p -r1.21 ch.c
--- ch.c3 Sep 2019 23:08:42 -   1.21
+++ ch.c6 Aug 2021 16:14:29 -
@@ -643,19 +643,6 @@ ch_flush(void)
ch_block = 0; /* ch_fpos / LBUFSIZE; */
ch_offset = 0; /* ch_fpos % LBUFSIZE; */
 
-#if 1
-   /*
-* This is a kludge to workaround a Linux kernel bug: files in
-* /proc have a size of 0 according to fstat() but have readable
-* data.  They are sometimes, but not always, seekable.
-* Force them to be non-seekable here.
-*/
-   if (ch_fsize == 0) {
-   ch_fsize = -1;
-   ch_flags &= ~CH_CANSEEK;
-   }
-#endif
-
if (lseek(ch_file, (off_t)0, SEEK_SET) == (off_t)-1) {
/*
 * Warning only; even if the seek fails for some reason,



Re: time.3: miscellaneous cleanup and rewrites

2021-08-05 Thread Ingo Schwarze
Hi Scott,

since i see this wasn't committed yet, it is OK schwarze@, too.

Consider leaving the .Nd alone since both jmc@ and millert@ recommended
that, and use "That version counted the time" in the HISTORY section
as recommended by jmc@.

I think you should indeed remove the sentence about gettimeofday(2)
from the DESCRIPTION because millert@ has a point that it's an
implementation detail, and even if considered interesting, it's
already mentioned in the HISTORY section.

Yours,
  Ingo


Scott Cheloha wrote on Fri, Jul 30, 2021 at 01:01:21PM -0500:

> Index: time.3
> ===
> RCS file: /cvs/src/lib/libc/gen/time.3,v
> retrieving revision 1.16
> diff -u -p -r1.16 time.3
> --- time.329 Jan 2015 01:46:30 -  1.16
> +++ time.330 Jul 2021 17:59:37 -
> @@ -36,52 +36,54 @@
>  .Os
>  .Sh NAME
>  .Nm time
> -.Nd get time of day
> +.Nd get the time of day
>  .Sh SYNOPSIS
>  .In time.h
>  .Ft time_t
> -.Fn time "time_t *tloc"
> +.Fn time "time_t *now"
>  .Sh DESCRIPTION
>  The
>  .Fn time
> -function returns the value of time in seconds since 0 hours, 0 minutes,
> -0 seconds, January 1, 1970, Coordinated Universal Time (UTC).
> +function returns the the number of seconds elapsed since
> +Jan 1 1970 00:00:00 UTC.
> +This value is also written to
> +.Fa now
> +unless
> +.Fa now
> +is
> +.Dv NULL .
>  .Pp
> -A copy of the time value may be saved to the area indicated by the
> -pointer
> -.Fa tloc .
> -If
> -.Fa tloc
> -is a null pointer, no value is stored.
> +This version of
> +.Fn time
> +is implemented with
> +.Xr gettimeofday 2 .
>  .Sh RETURN VALUES
> -Upon successful completion,
> +The
>  .Fn time
> -returns the value of time.
> -Otherwise a value of
> -.Po Fa time_t Pc Ns -1
> -is returned and the global variable
> -.Va errno
> -is set to indicate the error.
> -.Sh ERRORS
> -The following error codes may be set in
> -.Va errno :
> -.Bl -tag -width Er
> -.It Bq Er EFAULT
> -An argument address referenced invalid memory.
> -.El
> +function is always successful,
> +and no return value is reserved to indicate an error.
>  .Sh SEE ALSO
>  .Xr clock_gettime 2 ,
>  .Xr gettimeofday 2 ,
>  .Xr ctime 3
> +.Sh STANDARDS
> +The
> +.Fn time
> +function conforms to
> +.St -p1003.1-2008 .
>  .Sh HISTORY
>  A
>  .Fn time
>  system call first appeared in
> -.At v1
> -and used to return time in sixtieths of a second in 32 bits,
> -which was to guarantee a crisis every 2.26 years.
> -Since
> -.At v6 ,
> -.Fn time
> -scale was changed to seconds, extending the pre-crisis stagnation
> -period up to a total of 68 years.
> +.At v1 .
> +This version counted time in sixtieths of a second with a 32-bit return 
> value,
> +ensuring an integer overflow crisis every 2.26 years.
> +In
> +.At v6
> +the granularity of the return value was reduced to whole seconds,
> +delaying the aforementioned crisis until 2038.
> +In
> +.Bx 4.1c
> +the function was moved out of the kernel into the C standard library and
> +reimplemented with
> +.Xr gettimeofday 2 .



Re: Add versioned lib to system perl's @INC for non-packaged modules

2021-08-05 Thread Ingo Schwarze
Hi,

thanks to sthen@ for providing more background!

so the bottom line seems to be that, once the code changes are
tested, committed, and prove stable and once people come round to
it, moving the site manual page directories out of /usr/local/man/
and making them versioned in exactly the same way as the sitelib
directories may be a reasonable option.

Even if the main effect of versioning them might be to make it more
obvious for people what went wrong in case things get mixed up.

Yours,
  Ingo



Re: Add versioned lib to system perl's @INC for non-packaged modules

2021-08-04 Thread Ingo Schwarze
Hi Andrew,

Andrew Fresh wrote on Fri, Jul 30, 2021 at 05:34:40PM -0700:
> On Sun, May 16, 2021 at 03:30:39PM -0700, Andrew Hewus Fresh wrote:

>> There do appear to be some annoyances with still shared directories for
>> man pages, in that if you install a CPAN module and then attempt to
>> pkg_add it, it complains because "files already exist" in the man
>> directory.  We could separate the cpan man pages, or I think un-setting
>> the site_lib man*dir will mean cpan won't instal them.  I haven't yet
>> tested that, but I'm not sure what would be best there.

If i understand correctly, this is all about making installation of
un-ported Perl modules work better and reducing clashes between doing
that on the one hand and between package installation with pkg_add(1)
on the other hand.

Who will profit most from that?  I guess very experienced Perl users
and porters who play around a lot with many different Perl modules
and with very large Perl programs, in particular when evaluating
whether those large programs are worth porting.  Not having manual
pages installed for dependencies while trying to get a complicated
program to work and while trying to understand it seems like a
disservice to me.  So "un-setting the site_lib man*dir" doesn't
strike me as particularly convincing on first sight.

I guess ordinary users are less likely to use this route, they are more
likely to just use pkg_add(1).  But if they do, for whatever reason,
not getting manual pages is not good for them, either.

So i think in the long term, installing those manual pages into their
own manpath outside /ust/local/man/ would seem desirable to me.
I'm not saying it needs to be done in the same commit; sorting that
out can certainly be a later step.


You suggest to make the sitelib directory versioned.  Consequently,
i guess that you want the following process to work:

 1. install base and arbitrary numbers of packages
 2. install lots of modules and programs from CPAN for evaluation
 3. install the next Perl version outside base to figure out what
needs to be done to ultimately move base to it
 4. install various modules for the new Perl from step 3
for testing purposes

If i got that right, then the new man directory needs to be versioned
just like the new sitelib directory, or the manual pages from steps 2
and 4 will clash just like the libraries from steps 2 and 4 would.
The price to pay, of course, is that you need to edit man.conf(5)
for each new Perl you install.  And not just you, the base system
Perl maintainer, but all users installing stuff from CPAN.

On the other hand, if working with two different Perl versions in
parallel is not a requirement (for example because people do testing
like in steps 3 and 4 on separate machines?), then i don't see why
either the sitelib or the new man dir needs to be versioned at all,
making everything simpler.

Yours,
  Ingo



Re: sleep.3: miscellaneous cleanup and rewrites

2021-07-29 Thread Ingo Schwarze
Hi Scott,

Scott Cheloha wrote on Tue, Jul 27, 2021 at 05:15:09PM -0500:
> On Mon, Jul 26, 2021 at 10:37:03AM -0600, Todd C. Miller wrote:
>> On Mon, 26 Jul 2021 07:55:00 -0500, Scott Cheloha wrote:

>>> Still wondering whether we need an Errors section to mention that
>>> sleep(3) can set errno.

>> POSIX doesn't define any errors.  Of the errors returned by nanosleep,
>> only EINTR is really possible.  I guess we could save & restore
>> errno in sleep.c but I don't see any implementations that actually
>> do that.  I only checked illumos, glibc and musl though.

> We definitely don't want to save/restore.
> 
> I meant more like a simple acknowledgement for the programmer.
> Knowing what can set errno is helpful.
> 
> We do it for other library functions where some part of the
> functionality is implemented with a system call.  In other manpages we
> say something like:
> 
> The
> .Fn foo
> function may fail and set
> .Va errno
> for any of the errors specified for the
> .Fn bar 2
> routine.
> 
> See, e.g., `man 3 nice` or `man 3 fopen`.
> 
> Then again, there's really only one error.

That would sound like overkill to me, especially since millert@ pointed
out that only EINTR is relevant.

> Maybe we should just
> duplicate the information in sleep.3.
> 
> What about this?
> 
> The
> .Fn sleep
> function sets
> .Va errno
> to
> .Dv EINTR
> if it is interrupted by the delivery of a signal.

I like that: it seems correct, complete, and concise.

OK schwarze@ for the patch as a whole.

Feel free to use or discard the two suggestions provided in-line
below as you see fit.

Yours,
  Ingo


> Index: sleep.3
> ===
> RCS file: /cvs/src/lib/libc/gen/sleep.3,v
> retrieving revision 1.16
> diff -u -p -r1.16 sleep.3
> --- sleep.3   8 Feb 2020 01:09:57 -   1.16
> +++ sleep.3   27 Jul 2021 22:13:22 -
> @@ -32,7 +32,7 @@
>  .Os
>  .Sh NAME
>  .Nm sleep
> -.Nd suspend process execution for interval measured in seconds
> +.Nd suspend execution for an interval of seconds
>  .Sh SYNOPSIS
>  .In unistd.h
>  .Ft unsigned int
> @@ -40,41 +40,46 @@
>  .Sh DESCRIPTION
>  The
>  .Fn sleep
> -function suspends execution of the calling process until either
> +function suspends execution until at least the given number of

In nanosleep(2), you say

  ... suspends execution of the calling thread ...

Would that be useful here, too?

>  .Fa seconds
> -seconds have elapsed or a signal is delivered to the process and its
> -action is to invoke a signal-catching function or to terminate the
> -process.
> -The suspension time may be longer than requested due to the
> -scheduling of other activity by the system.
> +have elapsed or an unmasked signal is delivered to the calling thread.
>  .Pp
> -This function is implemented using
> -.Xr nanosleep 2
> -by pausing for
> -.Fa seconds
> -seconds or until a signal occurs.
> -Consequently, in this implementation,
> -sleeping has no effect on the state of process timers,
> -and there is no special handling for
> -.Dv SIGALRM .
> +This version of
> +.Fn sleep
> +is implemented with
> +.Xr nanosleep 2 ,
> +so delivery of any unmasked signal will terminate the sleep early,
> +even if
> +.Dv SA_RESTART
> +is set with
> +.Xr sigaction 2
> +for the interrupting signal.
>  .Sh RETURN VALUES
> -If the
> +If
> +.Fn sleep
> +sleeps for the full count of
> +.Fa seconds
> +it returns 0.
> +Otherwise,
> +.Fn sleep
> +returns the number of seconds remaining from the original request.
> +.Sh ERRORS
> +The
>  .Fn sleep
> -function returns because the requested time has elapsed, the value
> -returned will be zero.
> -If the
> -.Fn sleep
> -function returns due to the delivery of a signal, the value returned
> -will be the unslept amount (the request time minus the time actually
> -slept) in seconds.
> +function sets
> +.Va errno
> +to
> +.Dv EINTR
> +if it is interrupted by the delivery of a signal.
>  .Sh SEE ALSO
>  .Xr sleep 1 ,
> -.Xr nanosleep 2
> +.Xr nanosleep 2 ,
> +.Xr sigaction 2
>  .Sh STANDARDS
>  The
>  .Fn sleep
>  function conforms to
> -.St -p1003.1-90 .
> +.St -p1003.1-2008 .

I would consider it useful to add this sentence here:

Setting
.Va errno
is an extension to that specification.



Re: nanosleep.2: miscellaneous cleanup

2021-07-21 Thread Ingo Schwarze
Hi Scott,

Scott Cheloha wrote on Wed, Jul 21, 2021 at 11:02:00AM -0500:

[ EFAULT ]
> Given deraadt@'s response I'm just going to leave the existing
> language.

Fine with me.

> I guess I will need to dig into it a bit.  Finding the text of the
> really early documents, prior to SUSv2, is tricky.

I think you are right.  I recently tried to find an online copy
of XPG4.2 / SUSv1 (1994) and did not succeed.  What you want here
is even one step further back: POSIX.1 (1988), POSIX.1b (1993),
POSIX.1c (1995), POSIX.2 (1992), XPG4 (1992), see mdoc(7) for an
overview.

> One additional change:
> - Escape the minus sign for "-1".

I like that, using \- for mathematical minus is certainly correct in
roff(7) in general and perfectly acceptable in manual pages, too.
Our mandoc_char(7) page explains that we don't require it, but there
is a risk that groff(1) might start requiring it in the next release.
I tried to push back, but without success so far.

It seems you somewhere lost the s/one billion/1000 million/ tweak
that you advertised earlier.

Either way, OK schwarze@.

Yours,
  Ingo


> Index: nanosleep.2
> ===
> RCS file: /cvs/src/lib/libc/sys/nanosleep.2,v
> retrieving revision 1.16
> diff -u -p -r1.16 nanosleep.2
> --- nanosleep.2   31 Dec 2018 18:54:00 -  1.16
> +++ nanosleep.2   21 Jul 2021 15:57:55 -
> @@ -41,53 +41,71 @@
>  .Ft int
>  .Fn nanosleep "const struct timespec *timeout" "struct timespec *remainder"
>  .Sh DESCRIPTION
> +The
>  .Fn nanosleep
> -suspends execution of the calling process for the duration specified by
> +function suspends execution of the calling thread for at least the
> +duration specified by
>  .Fa timeout .
> -An unmasked signal will cause it to terminate the sleep early,
> -regardless of the
> +Delivery of an unmasked signal to the calling thread terminates the
> +sleep early,
> +even if
>  .Dv SA_RESTART
> -value on the interrupting signal.
> +is set with
> +.Xr sigaction 2
> +for the interrupting signal.
>  .Sh RETURN VALUES
> -If the
> +If
>  .Fn nanosleep
> -function returns because the requested duration has elapsed, the value
> -returned will be zero.
> +sleeps the full
> +.Fa timeout
> +without interruption it returns 0.
> +Unless
> +.Fa remainder
> +is
> +.Dv NULL ,
> +it is set to zero.
>  .Pp
> -If the
> +If
>  .Fn nanosleep
> -function returns due to the delivery of a signal, the value returned
> -will be \-1, and the global variable
> +is interrupted by a signal it returns \-1 and the global variable
>  .Va errno
> -will be set to indicate the interruption.
> -If
> +is set to
> +.Dv EINTR .
> +Unless
>  .Fa remainder
> -is non-null, the timespec structure it references is updated to contain the
> -unslept amount (the requested duration minus the duration actually slept).
> -.Sh ERRORS
> -If any of the following conditions occur, the
> +is
> +.Dv NULL ,
> +it is set to the unslept portion of the
> +.Fa timeout .
> +.Pp
> +Otherwise,
>  .Fn nanosleep
> -function shall return \-1 and set
> +returns \-1 and the global variable
>  .Va errno
> -to the corresponding value.
> +is set to indicate the error.
> +.Sh ERRORS
> +.Fn nanosleep
> +will fail if:
>  .Bl -tag -width Er
>  .It Bq Er EINTR
> -.Fn nanosleep
> -was interrupted by the delivery of a signal.
> +The call is interrupted by the delivery of a signal.
>  .It Bq Er EINVAL
>  .Fa timeout
> -specified a nanosecond value less than zero or greater than or equal to
> -1000 million,
> +specifies a nanosecond value less than zero or greater than or equal to
> +one billion,
>  or a second value less than zero.
>  .It Bq Er EFAULT
> -Either
>  .Fa timeout
> -or
> -.Fa remainder
>  points to memory that is not a valid part of the process address space.
> +.It Bq Er EFAULT
> +.Fa remainder
> +is not
> +.Dv NULL
> +and points to memory that is not a valid part of the process address space.
>  .El
>  .Sh SEE ALSO
>  .Xr sleep 1 ,
> +.Xr sigaction 2 ,
>  .Xr sleep 3
>  .Sh STANDARDS
>  The
> @@ -97,17 +115,23 @@ function conforms to
>  .Sh HISTORY
>  The predecessor of this system call,
>  .Fn sleep ,
> -appeared in
> -.At v3 ,
> -but was removed when
> +first appeared in
> +.At v3 .
> +It was removed in
> +.At v7
> +and replaced with a C library implementation based on
>  .Xr alarm 3
> -was introduced into
> -.At v7 .
> +and
> +.Xr signal 3 .
> +.Pp
>  The
>  .Fn nanosleep
> -system call has been available since
> +function first appeared in
> +.St -p1003.1b-93 .
> +.Pp
> +This implementation of
> +.Fn nanosleep
> +first appeared in
>  .Nx 1.3
>  and was ported to
> -.Ox 2.1
> -and
> -.Fx 3.0 .
> +.Ox 2.1 .



Re: nanosleep.2: miscellaneous cleanup

2021-07-21 Thread Ingo Schwarze
Hi Theo,

Theo de Raadt wrote on Tue, Jul 20, 2021 at 08:14:20PM -0600:
> Ingo Schwarze wrote:

>>  [EFAULT]  foo points outside the process's allocated address space.
>>
>> But i don't really i like that.  The word "allocated" makes me wonder
>> because it sounds too much like malloc(3) for my taste.
>> Usually, pointers to automatic and to static objects are acceptable,
>> too, and are those "allocated"?  Some might say they are not.
>> Besides, "process's" looks awkward.

> Disagree on your dislike of this wording.
> 
> A process has an address space, VM_MIN_ADDRESS to VM_MAXUSER_ADDRESS.
> Parts of this are not mapped, because nothing has allocated backing
> resources.  Allocation happens via stack growth, execve, shm, mmap, etc
> etc etc etc.  Those are all methods for "allocating" within the
> processes' address space.
> 
> Unallocated space generates EFAULT.
> 
> Addresses outside the VM_MIN_ADDRESS to VM_MAXUSER_ADDRESS range also
> generate EFAULT, and this is because backing resources are not permitted
> to be allocated there.  No allocation, thus EFAULT.
> 
> "allocated" is correct terminology, and it has nothing to do with
> malloc.

Fair enough, thank you for explaining.

Given that it is correct terminology, i recommend using it in the
nanosleep(2) manual, too, because it is the most widely used idiom
and it isn't excessively long.

> I don't see how trying to mince words is going to help anyone.
> 
> Unifying all the varients into one form will be quite a task.

Right, that would probably be a make-work project.  Having a wide
variety of wordings is slightly messy, but not enough of a problem
to warrant spending a significant effort on unification.

Yours,
  Ingo

> I don't know if this is the right form to use, but I don't like the
> argument you made.



Re: nanosleep.2: miscellaneous cleanup

2021-07-20 Thread Ingo Schwarze
Hi Scott,

Scott Cheloha wrote on Tue, Jul 20, 2021 at 05:20:16PM -0500:

> The nanosleep.2 page could use some cleanup.  Here's a bunch of fixes,
> rewrites, etc.
> 
> I've included my notes on the changes below.  I have some (mostly
> stylistic) questions in there, too.

Thanks for explaining what you are doing.  I'm snipping what i agree
with as well as what i have no opinion on and trust you on, but that
doesn't mean that mentioning it was useless.

> Should we reference sigaction(2) here alongside SA_RESTART?  e.g.:
> 
>   ... even if SA_RESTART is set for the interrupting signal
>   (see sigaction(2) for further details).

It isn't absolutely required.  On the one hand, people can use
"man -k any=SA_RESTART" to find out what it is.  On the other hand,
people can maybe be expected to know that sigaction(2) is the system
call to configure signal handling, and even if they ognore it, they
will easily find out with "man signal".

Then again, i can see how saying more precisely what SA_RESTART is
might be helpful especially for novice users, if it can be done
without being too much of a distraction.

In general, i dislike both parenthetic remarks and "see ... for details"
unless a more natural wording cannot be found.

Maybe something like:

  ... even if SA_RESTART was set with sigaction(2) for the
  interrupting signal.

Your call, i would say.

> I'm unsure about the correct voice here.  Should nanosleep actively
> return values?  Or should values "be returned"?

Both are acceptable.  In general, and not just in manual pages,
prefer the active voice when both work equally well.

> I'm unclear on whether I need the indent here in the .Bd block.  I
> think it looks better than unindented code, but maybe I'm flouting
> some practice we have.

Using tabs is permitted inside .Bd -literal and .Bd -unfilled.
They should definitely be used for indenting the bodies of for,
while, and if.

Whether you indent the whole thing depends on what looks better.
When the code is relatively narrow, indenting the whole example
is often a good idea.  When the code contains very long lines,
not indenting the whole example may be better.

If you choose to indent, i have no strong prefernce whether you
do it with tabs or with .Bd -literal -offset indent; maybe i very
slightly prefer the latter, because the global indentation is
a formatting choice rather than part of the displayed code.
But i really wouldn't complain about either.

> ERRORS
> - Simplify the opening sentence.  Yeesh, what a mouthful.

Indeed, ERRORS should not repeat the content of RETURN VALUES.

> Unsure if a second EFAULT case for remainder is a good thing here.
> Strictly speaking, NULL is not a valid part of the process address
> space.  Maybe that's too pedantic?

I don't think you can be too pedantic about where NULL is allowed
and where it isn't.  That's a notorious source of bugs, so precision
about NULL is usually a good thing.

> Also, do we have a more standard "boilerplate" sentence for EFAULT?

Judging from "man -l /usr/share/man/man2/*.2", the most common
wording is:

  [EFAULT]  foo points outside the process's allocated address space.

But i don't really i like that.  The word "allocated" makes me wonder
because it sounds too much like malloc(3) for my taste.
Usually, pointers to automatic and to static objects are acceptable,
too, and are those "allocated"?  Some might say they are not.
Besides, "process's" looks awkward.

These occur, too:

  [EFAULT]  The foo parameter is not in a valid part of the user
address space.

  [EFAULT]  foo references invalid memory.

  [EFAULT]  The name parameter specifies an area outside the
process address space.

  [EFAULT]  foo points to an illegal address.

  [EFAULT]  The userspace address uaddr is invalid.

  [EFAULT]  Part of buf points outside the process's allocated
address space.

  [EFAULT]  The buf parameter points to an invalid address.

  [EFAULT]  The argument foo specifies an invalid address.

  [EFAULT]  The foo parameter specified a bad address.

  [EFAULT]  The foo parameter points to memory not in
a valid part of the process address space.

  [EFAULT]  The address specified for foo is invalid.

  [EFAULT]  An argument address referenced invalid memory.

  In errno(2):
  14 EFAULT Bad address. The system detected an invalid address in
attempting to use an argument of a call.

  [EFAULT]  There was an error reading or writing the kevent
structure.

Quite a mess, i would say.

I think trying to explain over and over again what an invalid address
is, in each and every manual page, is neither reasonable nor feasible -
i'm not convinced the rules are really simple given how many types
of valid memory there are: static, stack, heap, ...

So i'd probably settle for a concise, simple wording, and i think
i like this one best:

  [EFAULT]  foo points to an invalid address.

What is valid can easily depend on the function 

  1   2   3   4   5   6   7   8   9   10   >