Re: start cleaning up UTF-8 processing in less(1)

2019-02-24 Thread Ingo Schwarze
Hi,

Stefan Sperling wrote on Sat, Feb 23, 2019 at 04:19:02PM +0100:

> Your diff looks good to me.
> And I can't see how it could make this situation any worse either.

thanks for checking; i committed the first patch.


To be able to continue with the less cleanup i started, i have to
explain some theory first, and i'm kindly asking for advice which
functionality is desired.  Personally, i'd like to make less simpler
and safer, removing some anachronistic, dangerous functionality,
see below.

Considerable complication in the less code results from support for
https://en.wikipedia.org/wiki/ANSI_escape_code sequences.  Note
that i will *not* propose to delete that support completely, but
only some more arcane and more dangerous parts of it, see at the
very end of this mail.

There are three ways to write an ANSI escape sequence; i'll have
to explain all three in turn.


First syntax: ESCAPE-BRACKET

  $ printf "normal \x1b[43m yellow \x1b[0m normal\n"

Each ANSI escape sequence is introduced by the two-byte sequence
consisting of the ASCII escape character followed by an ASCII opening
square bracket.  This form is relatively benign because it is both
valid ASCII and valid UTF-8 and looks identical in both encodings.
In our default configuration of xterm(1), which is moderately secure
without being paranoid, the above command works out of the box,
showing you the word "yellow" on a yellow background.  It doesn't
matter what LC_CTYPE is, it always works.

By default, less(1) does *not* interpret such escape sequences, but
escapes them and shows them as follows:

  normal ESC[43m yellow ESC[0m normal

where the two instances of the string "ESC" are highlighted in
reverse video.  But if you run less(1) with the -R option, it passes
the escape sequences on to the terminal, and again you see the word
"yellow" on a yellow background, inside less(1).  Horizontal scrolling
works correctly, preserving vertical alignment of display columns,
with the two escape sequences taking up zero columns each in an
xterm(1).  Your mileage may vary on exotic terminals, though.

I'm conivinced this behaviour ought to be preserved.


Second syntax: conventional 8-bit CSI
-
  $ LC_CTYPE=C printf "normal \23343m yellow \2330m normal\n"

Here, each ANSI escape sequence is introduced by the single
byte 0x9b (CSI = control sequence introducer) instead of by the
two-byte sequence ESC-[.  Obviously, this implies that the second
syntax is incompatible with UTF-8, so we always have to make sure
we have LC_CTYPE=C set when doing anything with the second syntax.
Note that i had to give the CSI bytes in octal in the printf command
above; \x9b43m would not work because printf(1) would be unable to
figure out where the hexadecimal character number ends.

In our default xterm(1) configuration, this syntax is *NOT*
supported.  The above command prints the same as

  $ printf "normal \xef\xbf\27543m yellow \xef\xbf\2750m normal\n"

where \xef\xbf\275 is the Unicode replacement character U+FFFD
substituted for the (unsafe, ill-encoded) CSI.

I you really want to, you can use the second syntax on OpenBSD,
but that is *ABSOLUTELY NOT RECOMMENDED*.  You would have to run
xterm(1) in the unsafe, legacy, so called "conventional 8bit" mode
as follows:

  $ LC_CTYPE=C xterm +lc  # This is a really BAD idea.

In such an xterm(1), the line

  $ LC_CTYPE=C printf "normal \23343m yellow \2330m normal\n"

does show the word "yellow" in front of a yellow background.

Reading the less(1) source code, i guess that it *intends* to support
this, but it doesn't actually work for me:

  $ export LC_CTYPE=C
  $ printf "normal \23343m yellow \2330m normal\n" | less -R

results in

  normal <9B>43m yellow <9B>0m normal

for me, with the two strings "<9B>" highlighted in reverse video.
Only with

  $ printf "normal \23343m yellow \2330m normal\n" | less -r

do i see yellow background - but that is utterly useless because -r
completely breaks column counting and wrapping in less(1), and besides
it is irrelevent because -r doesn't need to inspect anything but
simply passes *everything* through no matter what.

I think it is better to completely remove support for the second
syntax from less than to repair it.  It is rarely used because it
is incompatible with Unicode, it provides no advantages compared
to the first syntax, but it painfully complicates the code.


Third syntax: UTF-8 CSI
---
 $ printf "normal \xc2\23343m yellow \xc2\2330m normal\n"

Here, each ANSI escape sequence is introduced by the Unicode
character U+009B, represented in its two-byte UTF-8 encoding \xc2\x9b
instead of by the two-byte sequence ESC-[.  Obviously, this only
mskes sense with a UTF-8 locale.

However, the xterm documentation says in the file
/usr/xenocara/app/xterm/ctlseqs.ms :

  It is not possible to use a C1 control obtained from decoding the
  UTF-8 text, because that would require reprocessing the data.


Re: start cleaning up UTF-8 processing in less(1)

2019-02-25 Thread Ingo Schwarze
Hi Todd,

Todd C. Miller wrote on Mon, Feb 25, 2019 at 01:06:02PM -0700:
> On Mon, 25 Feb 2019 19:43:36 +0100, Ingo Schwarze wrote:
>> Todd C. Miller wrote on Mon, Feb 25, 2019 at 09:45:12AM -0700:
>>> On Mon, 25 Feb 2019 12:39:41 +0100, Ingo Schwarze wrote:

>>>> Index: line.c
>> [...]
>>>> @@ -469,11 +469,10 @@ in_ansi_esc_seq(void)
>>>> * Search backwards for either an ESC (which means we ARE in a seq);
>>>> * or an end char (which means we're NOT in a seq).
>>>> */
>>>> -  for (p = [curr]; p > linebuf; ) {
>>>> -  LWCHAR ch = step_char(, -1, linebuf);
>>>> -  if (IS_CSI_START(ch))
>>>> +  for (p = linebuf + curr - 1; p >= linebuf; p--) {

>>> Since curr can be 0, can this lead to be a single byte underflow?

>> No, in that case (which logically means the line buffer is empty),
>> the end condition p >= linebuf is false right away, the loop
>> is never entered, the function returns 0 right away and at the
>> call site, the first if brach (containing "curr--") isn't entered
>> either.

> Strictly speaking, the result of "p = linebuf + curr - 1" is undefined
> when curr < 1.  There is a special case in the standard when the
> result is one past the end of an array but no corresponding case
> for one element before the array.  In practice, it is unlikely to
> matter.

Ouch, i keep forgetting that one, thanks for reminding me.

In that case, let's just use an index rather than a pointer;
diff otherwise unchanged.

OK?
  Ingo


Index: cvt.c
===
RCS file: /cvs/src/usr.bin/less/cvt.c,v
retrieving revision 1.8
diff -u -p -r1.8 cvt.c
--- cvt.c   17 Sep 2016 15:06:41 -  1.8
+++ cvt.c   26 Feb 2019 01:31:31 -
@@ -77,7 +77,7 @@ cvt_text(char *odst, char *osrc, int *ch
dst--;
} while (dst > odst &&
!IS_ASCII_OCTET(*dst) && !IS_UTF8_LEAD(*dst));
-   } else if ((ops & CVT_ANSI) && IS_CSI_START(ch)) {
+   } else if ((ops & CVT_ANSI) && ch == ESC) {
/* Skip to end of ANSI escape sequence. */
src++;  /* skip the CSI start char */
while (src < src_end)
Index: filename.c
===
RCS file: /cvs/src/usr.bin/less/filename.c,v
retrieving revision 1.26
diff -u -p -r1.26 filename.c
--- filename.c  29 Oct 2017 17:10:55 -  1.26
+++ filename.c  26 Feb 2019 01:31:31 -
@@ -348,7 +348,7 @@ bin_file(int f)
pend = [n];
for (p = data; p < pend; ) {
LWCHAR c = step_char(, +1, pend);
-   if (ctldisp == OPT_ONPLUS && IS_CSI_START(c)) {
+   if (ctldisp == OPT_ONPLUS && c == ESC) {
do {
c = step_char(, +1, pend);
} while (p < pend && is_ansi_middle(c));
Index: less.h
===
RCS file: /cvs/src/usr.bin/less/less.h,v
retrieving revision 1.28
diff -u -p -r1.28 less.h
--- less.h  30 Dec 2018 23:09:58 -  1.28
+++ less.h  26 Feb 2019 01:31:31 -
@@ -38,8 +38,6 @@
 #defineIS_SPACE(c) isspace((unsigned char)(c))
 #defineIS_DIGIT(c) isdigit((unsigned char)(c))
 
-#defineIS_CSI_START(c) (((LWCHAR)(c)) == ESC || (((LWCHAR)(c)) == CSI))
-
 #ifndef TRUE
 #defineTRUE1
 #endif
@@ -151,9 +149,7 @@ struct textlist {
 #defineAT_INDET(1 << 7)  /* Indeterminate: either bold or 
underline */
 
 #defineCONTROL(c)  ((c)&037)
-
 #defineESC CONTROL('[')
-#defineCSI ((unsigned char)'\233')
 
 #defineS_INTERRUPT 01
 #defineS_STOP  02
Index: line.c
===
RCS file: /cvs/src/usr.bin/less/line.c,v
retrieving revision 1.23
diff -u -p -r1.23 line.c
--- line.c  24 Feb 2019 04:54:36 -  1.23
+++ line.c  26 Feb 2019 01:31:31 -
@@ -230,7 +230,7 @@ pshift(int shift)
 */
while (shifted <= shift && from < curr) {
c = linebuf[from];
-   if (ctldisp == OPT_ONPLUS && IS_CSI_START(c)) {
+   if (ctldisp == OPT_ONPLUS && c == ESC) {
/* Keep cumulative effect.  */
linebuf[to] = c;
attr[to++] = attr[from++];
@@ -463,17 +463,16 @@ backc(void)
 static int
 in_ansi_esc_seq(void)
 {
-   char *p;
+   int i;
 
/*
 * Search ba

less(1) UTF-8 cleanup: backc()

2019-03-02 Thread Ingo Schwarze
Hi,

the first step in cleaning up do_append() is the subroutine backc(),
which is only called from a single place, namely in do_append().
Its purpose is to back up over the previous positive-width character
during the application of backspace-encoding for "bold" or "underlined".

The task here is to get rid of the data type LWCHAR and to stop
calling the bad function step_char().

I tried to figure out whether the content of linebuf[] is guaranteed
to always be valid UTF-8, but i still don't feel completely convinced.
For example, the function forw_raw_line() in line.c looks suspicious,
as if it might potentially copy arbitrary, invalidly encoded data into
linebuf[].  Consequently, i consider it safer to perform complete
error handling in backc(), at least for now.

The new code for the two backward iterations in backc() looks slightly
repetive.  But since in the first iteration (for ch), failure means
there is no previous character whatsoever and we have to return failure,
whereas for the second iteration (for prev_ch), failure only means
that we need to call pwidth() without passing a previous character,
error handling is different, so trying to unify both backward iterations
would probably make the code more complicated rather than simpler.

Note that while prev_ch failure must not prevent the width measurement
of ch with pwidth(), if that returns 0 (i.e. when ch is a combining
accent), iterating to the preceding character only makes sense when
the preceding character is valid, so i'm adding "if (prev_ch == L'\0')
return (0);".  The lack of such a check in the old code may have
been a bug, but i'm not completely sure it could actually be triggered
in some way.

Finally, when mbtowc(3) fails, portability requires to reset the
internal state of mbtowc(3).  That's a no-op on OpenBSD, and it is
unlikely to actually be required for UTF-8 even on other systems
because internal state is only supposed to be useful for stateful
encodings, but i consider it good idiomatic coding to include the
reset anyway.  I'm also adding it to one place where i forgot
about it in an earlier patch.  Of course, it is only required for
len == -1, but it does no harm for the (less likely) case of stray
intervening continuation bytes (i + len < curr).

OK?
  Ingo


Index: line.c
===
RCS file: /cvs/src/usr.bin/less/line.c,v
retrieving revision 1.26
diff -u -p -r1.26 line.c
--- line.c  1 Mar 2019 16:41:37 -   1.26
+++ line.c  2 Mar 2019 11:54:41 -
@@ -437,23 +437,51 @@ pwidth(wchar_t ch, int a, wchar_t prev_c
 static int
 backc(void)
 {
-   LWCHAR prev_ch;
-   char *p = linebuf + curr;
-   LWCHAR ch = step_char(, -1, linebuf + lmargin);
-   int width;
+   wchar_t  ch, prev_ch;
+   int  i, len, width;
+
+   i = curr - 1;
+   if (utf_mode) {
+   while (i >= lmargin && IS_UTF8_TRAIL(linebuf[i]))
+   i--;
+   }
+   if (i < lmargin)
+   return (0);
+   if (utf_mode) {
+   len = mbtowc(, linebuf + i, curr - i);
+   if (len == -1 || i + len < curr) {
+   (void)mbtowc(NULL, NULL, MB_CUR_MAX);
+   return (0);
+   }
+   } else
+   ch = linebuf[i];
 
/* This assumes that there is no '\b' in linebuf.  */
while (curr > lmargin && column > lmargin &&
(!(attr[curr - 1] & (AT_ANSI|AT_BINARY {
-   curr = p - linebuf;
-   prev_ch = step_char(, -1, linebuf + lmargin);
+   curr = i--;
+   if (utf_mode) {
+   while (i >= lmargin && IS_UTF8_TRAIL(linebuf[i]))
+   i--;
+   }
+   if (i < lmargin)
+   prev_ch = L'\0';
+   else if (utf_mode) {
+   len = mbtowc(_ch, linebuf + i, curr - i);
+   if (len == -1 || i + len < curr) {
+   (void)mbtowc(NULL, NULL, MB_CUR_MAX);
+   prev_ch = L'\0';
+   }
+   } else
+   prev_ch = linebuf[i];
width = pwidth(ch, attr[curr], prev_ch);
column -= width;
if (width > 0)
return (1);
+   if (prev_ch == L'\0')
+   return (0);
ch = prev_ch;
}
-
return (0);
 }
 
@@ -536,8 +564,10 @@ store_char(LWCHAR ch, char a, char *rep,
break;
if (i >= 0) {
w = mbtowc(_ch, linebuf + i, curr - i);
-   if (w == -1 || i + w < curr)
+   if (w == -1 || i + w < curr) {
+   (void)mbtowc(NULL, NULL, MB_CUR_MAX);
  

Re: mandoc -Tlint systat.1 fix

2019-03-04 Thread Ingo Schwarze
Hi David,

David Gwynne wrote on Mon, Mar 04, 2019 at 08:59:12PM +1000:

> lint thinks uvm_swap_get() looks like a function name,
> so this uses .Fn to mark it up as one.
> 
> ok?

Sure.

Given that the function uvm_swap_get() appears to be important enough
to be mentioned even in a userland manual page, would it maybe also
make sense to document it in uvm(9)?  Not sure, just asking.

If so, then the line would become ".Xr uvm_swap_get 9" afterwards.

Yours,
  Ingo


> Index: systat.1
> ===
> RCS file: /cvs/src/usr.bin/systat/systat.1,v
> retrieving revision 1.110
> diff -u -p -r1.110 systat.1
> --- systat.1  25 Jul 2018 17:24:14 -  1.110
> +++ systat.1  4 Mar 2019 10:58:15 -
> @@ -697,7 +697,8 @@ swap pages in use
>  .It swpgonly
>  in use swap pages not in RAM
>  .It nswget
> -fault called uvm_swap_get()
> +fault called
> +.Fn uvm_swap_get
>  .It nanon
>  total anon's
>  .Pp



Re: update ctype data to unicode 10

2019-02-22 Thread Ingo Schwarze
Hi Andrew,

Lauri Tirkkonen wrote on Fri, Feb 22, 2019 at 01:57:01AM +0200:

> Hi, the recent perl-5.28.1 and related unicore update brought the
> unicode data from version 8.0.0 to version 10.0.0. That fixes some
> character classifications (eg. emoji characters gained East_Asian_Width
> value 'Wide', which causes them to correctly get a wcwidth() of 2). But
> the ctype source data needs to be regenerated with this new perl/unicore
> to gain the benefits.
> 
> So I've done just that:
> cd /usr/src/share/locale/ctype && ./gen_ctype_utf8.pl > en_US.UTF-8.src
> and the resulting diff is below. You could obviously run this yourself -
> I'm only including the diff because it took quite a long time to run the
> script (177m08.01s real).
> 
> The resulting LC_CTYPE generated from this new source gives a wcwidth()
> of 2 to eg. U+1F3EE, as expected (it used to be 1 with unicode 8.0
> data).

I looked through the diff (not checking each and every line thoroughly,
but watching out for stuff that looks suspicious, and doing occasional
random sampling tests), and i did not spot anything that looks wrong.

Much of it is simply additions of new characters, which are unlikely
to cause regressions in the first place, and some parts change width
one to width two, which appears to be intended by what Lauri says above
and which certainly isn't very dangerous either.

So if the diff agrees with what you got, Andrew, it's OK schwarze@.

Yours,
  Ingo



Re: update ctype data to unicode 10

2019-02-22 Thread Ingo Schwarze
Hi Andrew,

Andrew Fresh wrote on Thu, Feb 21, 2019 at 08:22:16PM -0700:
> On Fri, Feb 22, 2019 at 01:57:01AM +0200, Lauri Tirkkonen wrote:

>> Hi, the recent perl-5.28.1 and related unicore update brought the
>> unicode data from version 8.0.0 to version 10.0.0. That fixes some
>> character classifications (eg. emoji characters gained East_Asian_Width
>> value 'Wide', which causes them to correctly get a wcwidth() of 2). But
>> the ctype source data needs to be regenerated with this new perl/unicore
>> to gain the benefits.
>> 
>> So I've done just that:
>> cd /usr/src/share/locale/ctype && ./gen_ctype_utf8.pl > en_US.UTF-8.src
>> and the resulting diff is below. You could obviously run this yourself -

> I meant to do that and make sure it was OK with schwarze@,

I'm certainly OK with the basic idea of doing an update in this way.

> so it is OK afresh1@,

I guess once we are confident that not just the idea, but the specific
diff is OK, you'll not only get to OK it, but even to commit it, Andrew.  :)

> although I didn't compare your output to mine.

Should be trivial to do, or would that cause any inconvenience?

It seems like a free additional test to me, as another easy test
to make sure that nothing unexpected went awry.

>> I'm only including the diff because it took quite a long time to run the
>> script (177m08.01s real).

> There are a lot of unicode symbols.  Someday if I get super bored I'll
> write something to do it in parallel :-)

I clearly prefer simplicity over performance in this respect.


I'll now have a look at the diff itself to see whether anything
looks suspicious.

Yours,
  Ingo



Re: start cleaning up UTF-8 processing in less(1)

2019-02-25 Thread Ingo Schwarze
Hi,

Nicholas Marriott wrote on Mon, Feb 25, 2019 at 10:14:03AM +:
> Ingo Schwarze wrote:

>> During the upcoming cleanup steps, let use retain full support for
>> the first (ESC-[) syntax and lets us completely delete support for
>> the second and third CSI syntaxes (single-byte CSI and UTF-8
>> single-character two-byte CSI).
>> 
>> If you are OK with that plan, i'll send diffs implementing that.

> Very little, if anything at all, uses 8-bit or UTF-8 CSI,
> getting rid of it is a good idea.

Thank you for all your feedback.

So here is a patch doing that.

The central part is dropping the definitions related to the 8-bit
and the UTF-8 CSI from less.h, keeping only the definitions needed
for ESC-[.  In the future, these forms of the CSI will be treated
exactly like other control characters: e.g., encoded for display.

The rest of the diff mechanically resolves the IS_CSI_START() macro,
replacing it by a simple "== ESC" test.  Casting is needed at none
of the places: the types involved are simply:

'[' int (value 0x5b < 0x7f)
037 int (value 0x1f < 0x7f)  -- apply & -->
ESC int (value 0x1b < 0x7f)

So there is no danger in comparing ESC using == or != with any of
the types char, unsigned char (which are covered by the int range
during promotion) or LWCHAR == unsigned long, in which case the
(positve) constant ESC gets sign extended, which is value-preserving
for a positive integer.

The effects of the various chunks are, in detail:

 * cvt.c, cvt_text():
   8-bit and UTF-8 CSI are no longer recognized as ANSI escape
   sequences but copied just like any other control characters.
 * filename.c, bin_file():
   8-bit and UTF-8 CSI are no longer skipped but counted as
   binary characters just like any other binary characters.
   (change in behaviour confirmed by testing)
 * line.c, pshift():
   No longer treat 8-bit and UTF-8 CSI sequences specially
   when shifting a line left.
   (shifting left and right was tested)
 * line.c, in_ansi_esc_seq():
   No longer recognize 8-bit and UTF-8 CSI sequences.
   This allows simplifying the loop to iterate over bytes
   rather than over characters, getting rif of one LWCHAR
   variable and one call to the pesky function step_char(-1}.
   (it was tested that ESC-[ still works)
 * line.c, store_char():
   No longer recognize 8-bit and UTF-8 CSI sequences.
   Again, minus one LWCHAR and minus one step_char(-1).
 * line.c, do_append():
   No longer pass 8-bit and UTF-8 CSI unencoded, encode them
   like any other control character.

This survived light testing.
All code touched will be touched again in the upcoming cleanup
because all these functions still contain step_char() or get_wchar(),
but this step simplifies many of the upcoming cleanup steps.
Already minus six lines of code and minus seven macro calls
in this patch alone.

OK?
  Ingo


Index: cvt.c
===
RCS file: /cvs/src/usr.bin/less/cvt.c,v
retrieving revision 1.8
diff -u -p -r1.8 cvt.c
--- cvt.c   17 Sep 2016 15:06:41 -  1.8
+++ cvt.c   25 Feb 2019 11:25:38 -
@@ -77,7 +77,7 @@ cvt_text(char *odst, char *osrc, int *ch
dst--;
} while (dst > odst &&
!IS_ASCII_OCTET(*dst) && !IS_UTF8_LEAD(*dst));
-   } else if ((ops & CVT_ANSI) && IS_CSI_START(ch)) {
+   } else if ((ops & CVT_ANSI) && ch == ESC) {
/* Skip to end of ANSI escape sequence. */
src++;  /* skip the CSI start char */
while (src < src_end)
Index: filename.c
===
RCS file: /cvs/src/usr.bin/less/filename.c,v
retrieving revision 1.26
diff -u -p -r1.26 filename.c
--- filename.c  29 Oct 2017 17:10:55 -  1.26
+++ filename.c  25 Feb 2019 11:25:38 -
@@ -348,7 +348,7 @@ bin_file(int f)
pend = [n];
for (p = data; p < pend; ) {
LWCHAR c = step_char(, +1, pend);
-   if (ctldisp == OPT_ONPLUS && IS_CSI_START(c)) {
+   if (ctldisp == OPT_ONPLUS && c == ESC) {
do {
c = step_char(, +1, pend);
} while (p < pend && is_ansi_middle(c));
Index: less.h
===
RCS file: /cvs/src/usr.bin/less/less.h,v
retrieving revision 1.28
diff -u -p -r1.28 less.h
--- less.h  30 Dec 2018 23:09:58 -  1.28
+++ less.h  25 Feb 2019 11:25:38 -
@@ -38,8 +38,6 @@
 #defineIS_SPACE(c) isspace((unsigned char)(c))
 #defineIS_DIGIT(c) isdigit((unsigned char)(c))
 
-#defineIS_CSI_START(c) (((LWCHAR)(c)) == ESC || (((LWCHAR)(c

Re: nm(1): return on malloc error

2019-03-03 Thread Ingo Schwarze
Hi,

Benjamin Baier wrote on Sat, Mar 02, 2019 at 10:10:40AM +0100:

> On malloc error symtab is unmapped, so proceeding on will lead
> to a NULL pointer dereference.
> When malloc fails we should return like the MMAP case does.

while i'm certainly not experienced with nm(1), this change looks
correct to me.  OK to commit?

Note that returning implies that the program might attempt to process
further files, which is of dubious value: it is likely to fail, too,
and in the unlikely case of success, that's maybe even worse: the
output from subsequent files might cause the user to miss the error
message about the malloc failure...  But given that mmap(2) failure
already behaves like that, switching to just err(3) out on resource
exhaustion looks like a larger change which i'm not planning to
push for, even though it would make sense to me.

Here is the patch again, in standard format.

Yours
  Ingo


Index: nm.c
===
RCS file: /cvs/src/usr.bin/nm/nm.c,v
retrieving revision 1.53
diff -p -U8 -r1.53 nm.c
--- nm.c27 Oct 2017 16:47:08 -  1.53
+++ nm.c3 Mar 2019 15:12:28 -
@@ -376,16 +376,17 @@ show_symtab(off_t off, u_long len, const
MMAP(symtab, len, PROT_READ, MAP_PRIVATE|MAP_FILE, fileno(fp), off);
if (symtab == MAP_FAILED)
return (1);
 
namelen = sizeof(ar_head.ar_name);
if ((p = malloc(sizeof(ar_head.ar_name))) == NULL) {
warn("%s: malloc", name);
MUNMAP(symtab, len);
+   return (1);
}
 
printf("\nArchive index:\n");
num = betoh32(*symtab);
strtab = (char *)(symtab + num + 1);
for (ps = symtab + 1; num--; ps++, strtab += strlen(strtab) + 1) {
if (fseeko(fp, betoh32(*ps), SEEK_SET)) {
warn("%s: fseeko", name);



Re: nm(1): return on malloc error

2019-03-03 Thread Ingo Schwarze
Hi,

Benjamin Baier wrote on Sat, Mar 02, 2019 at 10:10:40AM +0100:

> On malloc error symtab is unmapped, so proceeding on will lead
> to a NULL pointer dereference.
> When malloc fails we should return like the MMAP case does.

Committed.

Thanks for the patch (and to those who checked it).
  Ingo


> Index: nm.c
> ===
> RCS file: /cvs/src/usr.bin/nm/nm.c,v
> retrieving revision 1.53
> diff -u -p -u -C10 -r1.53 nm.c
> *** nm.c  27 Oct 2017 16:47:08 -  1.53
> --- nm.c  20 Feb 2019 17:34:01 -
> *** show_symtab(off_t off, u_long len, const
> *** 374,393 
> --- 374,394 
>   restore = ftello(fp);
> 
>   MMAP(symtab, len, PROT_READ, MAP_PRIVATE|MAP_FILE, fileno(fp), off);
>   if (symtab == MAP_FAILED)
>   return (1);
> 
>   namelen = sizeof(ar_head.ar_name);
>   if ((p = malloc(sizeof(ar_head.ar_name))) == NULL) {
>   warn("%s: malloc", name);
>   MUNMAP(symtab, len);
> + return (1);
>   }
> 
>   printf("\nArchive index:\n");
>   num = betoh32(*symtab);
>   strtab = (char *)(symtab + num + 1);
>   for (ps = symtab + 1; num--; ps++, strtab += strlen(strtab) + 1) {
>   if (fseeko(fp, betoh32(*ps), SEEK_SET)) {
>   warn("%s: fseeko", name);
>   rval = 1;
>   break;
> 



less(1) UTF-8 cleanup: store_char()

2019-02-26 Thread Ingo Schwarze
Hi,

Nicholas Marriott wrote on Tue, Feb 26, 2019 at 07:20:47AM +:

> Looks good, ok nicm

Thanks to all of you for checking.

Here is the next step, slowly coming closer to the meat of the
matter: Start cleaning up the function store_char(), which takes a
wide character, receiving both the Unicode codepoint "LWCHAR ch"
and the UTF-8 multibyte representation "char *rep" as (redundant)
input, behaves specially when ch is part of an ESC[...m ANSI sequence,
otherwise measures the display width of ch, checks that it still
fits on the screen, copies the multibyte representation to the line
buffer, and advances the global variables "curr" (byte position)
and "column" (visual display position).

This is the only function calling in_ansi_esc_seq(), and having that
function separate is actually counter-productive: if an escape sequence
is found, then store_char() currently does the same backward iteration
again if the sequence is incomplete and needs to be deleted.
Consequently, merging in_ansi_esc_seq() into store_char() simplifies
the code.

If ch neither continues an ANSI sequence begun earlier nor starts a
new one, the code needs to find the previous character in order to
measure the width of the current one with pwidth(): the current one
might be BACKSPACE.  Currently, this backward iteration is done
with the horrific, buggy function step_char(-1) which we aim to
delete.  Instead, iterate backwards until we find something that
is not a UTF-8 continuation byte (i.e. which is an ASCII byte or a
UTF-8 start byte) then decode the character with the standard
function mbtowc(3).

For extra safety, let's not only check that the previous character
is valid (mbtowc() != -1) but also that there are no stray continuation
bytes between the previous character and the character currently
being added (i + w == curr).  Maybe invalid UTF-8 can't actually
make it into the line buffer, but let's better play it safe.  If
there is no previous character on the line, if it is invalid, or
if there are stray continuation bytes after it, use a dummy blank,
which means that in case of doubt, we assume that a backspace backs
up by one visual column.

Note that the function still contains a call to the buggy function
utf_len() that we want to delete.  Ultimately, most instances of
utf_len() will be replaced with the standard function mblen(3), but
we can't do that right now in store_char().  At this point, the
function store_char() may still receive bogus characters incorrectly
parsed with get_wchar(), and handling these at least consistently
(even though not correctly) requires sticking with utf_len() for
now.

I tested that emboldening with "c\bc" and underlining with "c\b_"
still works, even when "c" is a width two East Asian character,
and that behaviour is sane with intervening stray continuation
bytes.

OK?
  Ingo


Index: line.c
===
RCS file: /cvs/src/usr.bin/less/line.c,v
retrieving revision 1.24
diff -u -p -r1.24 line.c
--- line.c  26 Feb 2019 11:01:54 -  1.24
+++ line.c  26 Feb 2019 12:26:42 -
@@ -458,27 +458,6 @@ backc(void)
 }
 
 /*
- * Are we currently within a recognized ANSI escape sequence?
- */
-static int
-in_ansi_esc_seq(void)
-{
-   int i;
-
-   /*
-* Search backwards for either an ESC (which means we ARE in a seq);
-* or an end char (which means we're NOT in a seq).
-*/
-   for (i = curr - 1; i >= 0; i--) {
-   if (linebuf[i] == ESC)
-   return (1);
-   if (!is_ansi_middle(linebuf[i]))
-   return (0);
-   }
-   return (0);
-}
-
-/*
  * Is a character the end of an ANSI escape sequence?
  */
 int
@@ -512,6 +491,7 @@ is_ansi_middle(LWCHAR ch)
 static int
 store_char(LWCHAR ch, char a, char *rep, off_t pos)
 {
+   int i;
int w;
int replen;
char cs;
@@ -529,22 +509,43 @@ store_char(LWCHAR ch, char a, char *rep,
}
}
 
-   if (ctldisp == OPT_ONPLUS && in_ansi_esc_seq()) {
-   if (!is_ansi_end(ch) && !is_ansi_middle(ch)) {
+   w = -1;
+   if (ctldisp == OPT_ONPLUS) {
+   /*
+* Set i to the beginning of an ANSI escape sequence
+* that was begun and not yet ended, or to -1 otherwise.
+*/
+   for (i = curr - 1; i >= 0; i--) {
+   if (linebuf[i] == ESC)
+   break;
+   if (!is_ansi_middle(linebuf[i]))
+   i = 0;
+   }
+   if (i >= 0 && !is_ansi_end(ch) && !is_ansi_middle(ch)) {
/* Remove whole unrecognized sequence.  */
-   do {
-   curr--;
-   } while (curr > 0 && linebuf[curr] != ESC);
+   curr = i;
return (0);
}
- 

Re: archaic quotes in calendar

2019-02-23 Thread Ingo Schwarze
Hi,

Anthony J. Bentley wrote on Wed, Jan 30, 2019 at 04:01:30AM -0700:
> Jason McIntyre writes:

>> in the man page you have used Sq. that will make it mark up the same as
>> it already does:
>>
>>  (`*')

> There's a difference: Ted uses a UTF-8 locale where Sq shows up as
> pretty Unicode quotes. Literally typing `' doesn't (not in the terminal,
> anyway).
> 
> mandoc should render Sq as ' ' in ASCII mode. We fixed Dq the same way
> in both mandoc and groff a couple of years ago (`` '' to " ").

To finally conclude this thread: after a thorough discussion on the
list , this patch, and the corresponding patch
to groff, has been rejected.  While some liked our arguments of
making output nicer with Unicode-copatible fonts and avoiding ASCII
usage that conflicts with Unicode, no consensus could be reached
because substantial objections were raised, too.

The main objection is that using 0x60 for the opening single quote
(as an alternative to using it for a grave accent) and 0x27 for the
closing single quote (as an alternative to using it for an apostrophe),
according to the latest version of the ASCII standard, is a valid
way to use the ASCII character encoding and has been valid since
these bytes were frist assigned meaning in the 1965 edition of ASCII -
even though this usage is no longer compatible with ISO-LATIN nor with
Unicode.

Moreover, it is the traditional way in which *roff used ASCII, it
is compatible with the way many traditional terminals (including
DEC VT-100 and VT-220) display these ASCII characters, and several
groff developers are still using fonts for ASCII display that display
` ' as symmetric single quotes.  Even our own Gallant console font
does that.  Several groff developers dislike the idea of changing
established groff behaviour, even for the somewhat old-fashioned
ASCII output device, unless there are strong technical reasons, and
i had to concede that this change would be more of the "nice to
have" kind and isn't strongly required for anything.

The case Dq -> ``/" is substantially different because in ASCII,
double quotes - both opening and closing - were always clearly
defined as ".


Consequently, people using a modern Unicode-compatible font for
ASCII output will have to live with the slightly ugly single quotes,
likely for good.

Needless to say, if you can use the -T utf8 output device, for
example by running with LC_CTYPE=en_US.UTF-8, the issue doesn't
occur in the first place, and you already get nice single quotes
right now, without needing any code changes.

Yours,
  Ingo


> Index: usr.bin/mandoc/chars.c
> ===
> RCS file: /cvs/src/usr.bin/mandoc/chars.c,v
> retrieving revision 1.48
> diff -u -p -r1.48 chars.c
> --- usr.bin/mandoc/chars.c15 Dec 2018 19:30:19 -  1.48
> +++ usr.bin/mandoc/chars.c30 Jan 2019 10:55:08 -
> @@ -100,7 +100,7 @@ static struct ln lines[] = {
>   { "rq", "\"",   0x201d  },
>   { "Lq", "\"",   0x201c  },
>   { "Rq", "\"",   0x201d  },
> - { "oq", "`",0x2018  },
> + { "oq", "\'",   0x2018  },
>   { "cq", "\'",   0x2019  },
>   { "aq", "\'",   0x0027  },
>   { "dq", "\"",   0x0022  },



start cleaning up UTF-8 processing in less(1)

2019-02-23 Thread Ingo Schwarze
Hi,

Evan Silberman, by posting a well-founded bug report to bugs@,
just drew my attention to the pigsty of having a hand-rolled
re-implemention of Unicode character property handling within
less(1).

The root of the evil is the existence of the custom definition

  typedef unsigned long LWCHAR;

in the file less.h, instead of simply using wchar_t where needed.
My first thought was "maybe they are doing that to store Unicode
codepoints portably on platforms where the values stored in wchar_t
are not Unicode codepoints" - but it turns out less(1) is *already*
broken on such platforms because the file cvt.c, function cvt_text(),
already calls iswupper(3) and towlower(3) on LWCHAR values, i.e. on
Unicode codepoints.  And some other functions are even worse than that:

 * search.c, is_ucase() calls isupper(3) - without *w*! - on an LWCHAR
   value, which makes no sense whatsoever.
 * charset.c, control_char(), calls iscntrl((unsigned char)c)
   on an LWCHAR value - which is absurd to the point of not even being
   funny any longer, rather making you cry.  When i first saw that,
   i was struck with incredulity, but it is really truncating valid
   codepoints to their lowest eight bits and then handing those
   meaningless bits to iscntrl(3)...

Of course it would be possible to root out all the insanity in one
bold step.  But it would be a very large diff, prone to introducing
bugs, hard to review, so i decided against that route.

Instead, for the moment, let us assume that wchar_t always stores
Unicode codepoints and is always contained within the "unsigned
long" range.  Both will always remain true on OpenBSD.  Making these
assumptions allows me to proceed step by step, with small, focussed
diffs that will be easy to argue about.

The end goal is to get rid of the LWCHAR data type and to delete
most of the charset.c file, maybe even all of it, without adding any
new code.

Once that goal is reached, i expect our less(1) will be even *more*
portable than it is now, because once that is reached, everything
that requires Unicode character properties will be done with wchar_t,
which is likely to just work even on platforms where wchar_t uses
some arcane format instead of simply storing Unicode codepoints.


Pursuing that goal, i first tried to proceed bottom-up, i.e. by
starting to first replace the lowest-level functions like get_wchar()
and step_char(), and then work upwards.  But that failed miserably.
What these low-level functions are doing is generally so wrong, so
absurd, that it is unreasonable to simply replace them: the internal
APIs they define are practically unusable by the callers.  Instead,
all call sites have to be inspected one by one, figuring out what
the code *should* do at the call sites and then implement that with
standard functions like mbtowc(3) and iswprint(3).

I combed through the less source code and identified 35 functions
in seven files that have to be partially rewritten, totally rewritten,
or deleted because what they are doing is wrong, in most cases
completely absurd.  I don't see any alternative to tackling them
one by one.

I decided to start with the file line.c because it is central to
the purpose of less: displaying lines of text to the screen.
To quickly get to the important function store_char(), which
intends to save a character into the line output buffer, i first
have to clean up the function pwidth() that it uses.  The function
pwidth() intends to measure the display width of a character, i.e.
how many columns it takes up on the screen: 0, 1, or 2.  So it has
to become a shallow wrapper around wcwidth(3).  While i doubt the
wisdom of the addition of display widths of ANSI escape sequences
that it is doing at the end - on any sane terminal, such sequences
should never take up any additional columns - i decided to stay
focussed on UTF-8 issues and just leave the code related to attributes
unchanged for now.

Here is the benfit of the following diff:  it gets rid of
 - two LWCHAR function arguments
 - two calls to is_wide_char() which uses an outdated table
 - one call to is_composing_char() which uses an outdated table
 - one call to is_combining_char() which uses an outdated table
 - one call to control_char() which is totally broken
 - one call to is_ascii_char() which is a trivial function
 - two tests of the ideosyncratic "utf_mode" variable

None of the horrific features listed above can be deleted just yet
because they are also in use at other places in less.

Of course, the diff *does* change behaviour: character display
widths (for the purpose at hand) get updated to the latest version
of Unicode in our tree, and the effects of any bugs contained in
the functions listed above silently vanish.  That should only make
things better and not cause regressions.  I didn't check whether
no longer calling control_char() here fixes some bugs; that is quite
possible, but depends on which arguments pwidth() is called with,
which i didn't try to research.  The new version 

less(1) UTF-8 bugfix and cleanup: search.c

2019-03-14 Thread Ingo Schwarze
Hi,

the following is a very simple patch to completely clean up the
file less/search.c with respect to UTF-8 handling.  It also fixes
an outright bug: Searching for uppercase UTF-8 characters currently
doesn't work because passing a Unicode codepoint (in this case, the
"ch" retrieved with step_char()) to isupper(3) is just totally
wrong.

The new loop is fairly standard.  Invalid bytes are simply skipped.

OK?
  Ingo


P.S.
I'm sending this even though my pappend() patch
  https://marc.info/?l=openbsd-tech=155249735725712
is still looking for review.

Both are touching different files and are independent of each other.
The cleanup is now maybe about half done with only one or two
functions remaining in the most affected file line.c, but the files
cmdbuf.c, cvt.c, and filename.c still remain, so i might have to
speed up a bit to get it done before release comes too close.


Index: search.c
===
RCS file: /cvs/src/usr.bin/less/search.c,v
retrieving revision 1.19
diff -u -p -r1.19 search.c
--- search.c2 Aug 2017 19:35:57 -   1.19
+++ search.c14 Mar 2019 13:48:59 -
@@ -75,12 +75,14 @@ static struct pattern_info filter_info;
 static int
 is_ucase(char *str)
 {
-   char *str_end = str + strlen(str);
-   LWCHAR ch;
+   wchar_t ch;
+   int len;
 
-   while (str < str_end) {
-   ch = step_char(, +1, str_end);
-   if (isupper(ch))
+   for (; *str != '\0"; str += len) {
+   if ((len = mbtowc(, str, MB_CUR_MAX)) == -1) {
+   mbtowc(NULL, NULL, MB_CUR_MAX);
+   len = 1;
+   } else if (iswupper(ch))
return (1);
}
return (0);



Re: less(1) UTF-8 bugfix and cleanup: search.c

2019-03-14 Thread Ingo Schwarze
Hi Stefan,

Stefan Sperling wrote on Thu, Mar 14, 2019 at 04:11:00PM +0100:

> Yes, OK.

Thanks for checking.

Here is the next step outside line.c, completely cleaning up the file
less/filename.c with respect to UTF-8 handling.  The loop needed is very
similar to the one in search.c except that the case data[i] == '\0' can
occur here, resulting in a 0 return value from mbtowc(3), which has to
be counted as a binary character.

Note that the patch does change behaviour even for LC_CTYPE=C.
Currently, arcane ASCII (C0) control characters like 0x01 SOH,
0x02 STX, and so on are not counted as binary, but i think they should.
Currently, with LC_CTYPE=C, only bytes with the high bit set are
counted as binary.  The following patch counts everything as binary
except:
 * printable characters
 * whitespace characters
 * the 0x08 backspace character
 * and (with -R only) the 0x1b escape character

There is no need to handle the ASCII and UTF-8 cases separately.
With LC_CTYPE=C, the new code will advance byte by byte and count
non-ASCII bytes as binary as it should.

The third (n - i) argument to mbtowc(3) prevents buffer overruns;
no harm is done when n - i exceeds MB_CUR_MAX.

The additional, inner loop on escape sequences has no effect and
can be deleted.  The function is_ansi_middle() only returns true
for printable ASCII characters, and for those, the outer loop is
already good enough.  Neither the old nor the new code tries to
check whether the escape sequence ends in a valid or invalid manner.
One could count the ESC as binary for invalid sequences - but it
is not quite clear which sequences should be considered valid and
i don't think this minor detail is worth adding any complexity.
The function only provides crude heuristics in the first place.

This patch cleans up the only caller of the bad function binary_char(),
which can consequently be deleted.

While here, use ssize_t for the return value of read(2) rather than
int, even though it isn't a bug since the nbytes argument is always
the constant 256.  And avoid an idiom that relies on undefined
behaviour by moving p out of the buffer on read(2) failure; the
latter change does the job more portably with one variable less
on the stack.

OK?
  Ingo


Index: filename.c
===
RCS file: /cvs/src/usr.bin/less/filename.c,v
retrieving revision 1.27
diff -u -p -r1.27 filename.c
--- filename.c  26 Feb 2019 11:01:54 -  1.27
+++ filename.c  14 Mar 2019 18:15:54 -
@@ -334,25 +334,25 @@ fcomplete(char *s)
 int
 bin_file(int f)
 {
-   int n;
-   int bin_count = 0;
char data[256];
-   char *p;
-   char *pend;
+   ssize_t i, n;
+   wchar_t ch;
+   int bin_count, len;
 
if (!seekable(f))
return (0);
if (lseek(f, (off_t)0, SEEK_SET) == (off_t)-1)
return (0);
n = read(f, data, sizeof (data));
-   pend = [n];
-   for (p = data; p < pend; ) {
-   LWCHAR c = step_char(, +1, pend);
-   if (ctldisp == OPT_ONPLUS && c == ESC) {
-   do {
-   c = step_char(, +1, pend);
-   } while (p < pend && is_ansi_middle(c));
-   } else if (binary_char(c))
+   bin_count = 0;
+   for (i = 0; i < n; i += len) {
+   len = mbtowc(, data + i, n - i);
+   if (len <= 0) {
+   bin_count++;
+   len = 1;
+   } else if (iswprint(ch) == 0 && iswspace(ch) == 0 &&
+   data[i] != '\b' &&
+   (ctldisp != OPT_ONPLUS || data[i] != ESC))
bin_count++;
}
/*
Index: charset.c
===
RCS file: /cvs/src/usr.bin/less/charset.c,v
retrieving revision 1.21
diff -u -p -r1.21 charset.c
--- charset.c   20 Apr 2017 21:23:16 -  1.21
+++ charset.c   14 Mar 2019 18:15:54 -
@@ -146,18 +146,6 @@ init_charset(void)
 }
 
 /*
- * Is a given character a "binary" character?
- */
-int
-binary_char(LWCHAR c)
-{
-   if (utf_mode)
-   return (is_ubin_char(c));
-   c &= 0377;
-   return (!isprint((unsigned char)c) && !iscntrl((unsigned char)c));
-}
-
-/*
  * Is a given character a "control" character?
  */
 int
Index: funcs.h
===
RCS file: /cvs/src/usr.bin/less/funcs.h,v
retrieving revision 1.20
diff -u -p -r1.20 funcs.h
--- funcs.h 1 Mar 2019 14:31:34 -   1.20
+++ funcs.h 14 Mar 2019 18:15:54 -
@@ -56,7 +56,6 @@ void ch_init(int, int);
 void ch_close(void);
 int ch_getflags(void);
 void init_charset(void);
-int binary_char(LWCHAR);
 int control_char(LWCHAR);
 char *prchar(LWCHAR);
 char *prutfchar(LWCHAR);



Re: mount(8): remove unnecessary white space in man page

2019-03-10 Thread Ingo Schwarze
Hi Alessandro,

Alessandro Gallo wrote on Sun, Mar 10, 2019 at 02:12:21PM +0100:

> There seems to be an unneeded extra space in the mount(8) man page.
> The following diff fixes that:

Committed, thanks.
  Ingo


> Index: mount.8
> ===
> RCS file: /cvs/src/sbin/mount/mount.8,v
> retrieving revision 1.89
> diff -u -p -u -p -r1.89 mount.8
> --- mount.8   18 Jan 2018 08:57:12 -  1.89
> +++ mount.8   10 Mar 2019 13:07:06 -
> @@ -318,7 +318,7 @@ command:
>  # mount -a -t nonfs,mfs
>  .Ed
>  .Pp
> -mounts all file systems except those of type NFS and MFS .
> +mounts all file systems except those of type NFS and MFS.
>  .Pp
>  .Nm
>  will attempt to execute a program in
> 
> Thank you,
> Alessandro
> 



less(1) UTF-8 cleanup: pappend()

2019-03-13 Thread Ingo Schwarze
Hi Todd,

Todd C. Miller wrote on Tue, Mar 12, 2019 at 09:14:43AM -0600:

> This looks like an improvement to me.  OK millert@

Thanks for checking the do_append() patch, i committed it.

The next step is to clean up pappend(), which reads one byte of
a multibyte character per call and outputs the multibyte character
once it is complete.  As long as the character is not yet complete,
the bytes are cached in the static buffer mbc_buf; the current
number of bytes already cached is mbc_buf_index, the byte position
of the start byte is mbc_pos.

The static variable mbc_buf_len stores the number of bytes expected
in the current character according to the start byte.  When using
standard functions, this information is irrelevant, so i'm deleting
the variable, which simplifies global state.

The proper tool to distinguish valid, incomplete, and invalid
multibyte characters is mbrtowc(3).  The simpler and generally more
recommended function mbtowc(3) which i used in earlier patches
cannot be used for this purpose because it does not distinguish
incomplete from invalid sequences.  The single call to mbrtowc(3)
replaces one call of each of the following functions and macros
that we want to get rid of: IS_ASCII_OCTET(), IS_UTF8_LEAD(),
utf_len(), IS_UTF8_TRAIL(), and is_utf8_well_formed().

Some additional observations:
 * A goto is replaced by a proper loop.
 * In case of an invalid start byte, calling store_prchar() directly
   is simpler than the detour via flush_mbc_buf().
 * When the line overflows, rather than first storing the number of
   bytes to back up in mbc_buf_index, then moving it to r at the
   end of the function, it is simpler to store it into r right away.
 * The patch replaces 11 lines of code with 10 lines of comments.  :-)

Tests done:
 * ASCII CR bytes are correctly shown encoded as ^M after ASCII bytes,
   after valid UTF-8 characters, and when interrupting incomplete
   UTF-8 characters.
 * Incomplete UTF-8 characters are correctly shown in encoded form
   when interrupted by ASCII characters, by valid UTF-8 characters,
   or by invalid bytes.
 * Width 2 Unicode characters correctly wrap to the next line when
   their left half would be in the last column of the screen.
 * Encoded representations of invalid bytes and of valid, but
   non-printable UTF-8 characters correctly wrap to the next line
   in their entirety unless all their output bytes still fit on the
   current screen output line.
 * All tests were also performed with LC_CTYPE=C.

OK?
  Ingo


Index: line.c
===
RCS file: /cvs/src/usr.bin/less/line.c,v
retrieving revision 1.28
diff -u -p -r1.28 line.c
--- line.c  13 Mar 2019 11:22:56 -  1.28
+++ line.c  13 Mar 2019 16:07:35 -
@@ -64,7 +64,6 @@ extern off_t start_attnpos;
 extern off_t end_attnpos;
 
 static char mbc_buf[MAX_UTF_CHAR_LEN];
-static int mbc_buf_len = 0;
 static int mbc_buf_index = 0;
 static off_t mbc_pos;
 
@@ -129,7 +128,6 @@ prewind(void)
column = 0;
cshift = 0;
overstrike = 0;
-   mbc_buf_len = 0;
is_null_line = 0;
pendc = '\0';
lmargin = 0;
@@ -683,6 +681,9 @@ flush_mbc_buf(off_t pos)
 int
 pappend(char c, off_t pos)
 {
+   mbstate_t mbs;
+   size_t sz;
+   wchar_t ch;
int r;
 
if (pendc) {
@@ -696,13 +697,12 @@ pappend(char c, off_t pos)
}
 
if (c == '\r' && bs_mode == BS_SPECIAL) {
-   if (mbc_buf_len > 0)  /* utf_mode must be on. */ {
+   if (mbc_buf_index > 0)  /* utf_mode must be on. */ {
/* Flush incomplete (truncated) sequence. */
r = flush_mbc_buf(mbc_pos);
-   mbc_buf_index = r + 1;
-   mbc_buf_len = 0;
+   mbc_buf_index = 0;
if (r)
-   return (mbc_buf_index);
+   return (r + 1);
}
 
/*
@@ -718,40 +718,43 @@ pappend(char c, off_t pos)
if (!utf_mode) {
r = do_append((LWCHAR) c, NULL, pos);
} else {
-   /* Perform strict validation in all possible cases. */
-   if (mbc_buf_len == 0) {
-retry:
-   mbc_buf_index = 1;
-   *mbc_buf = c;
-   if (IS_ASCII_OCTET(c)) {
-   r = do_append((LWCHAR) c, NULL, pos);
-   } else if (IS_UTF8_LEAD(c)) {
-   mbc_buf_len = utf_len(c);
+   for (;;) {
+   if (mbc_buf_index == 0)
mbc_pos = pos;
-   return (0);
-   } else {
-   /* UTF8_INVALID or stray UTF8_TRAIL */
-   r = flush_mbc_buf(pos);
-   }
-   } else if 

Re: xman.man: typo in mandesc section

2019-03-08 Thread Ingo Schwarze
Hi Alejandro,

we normally avoid trivial changes to X11 documentation.  Such changes
provide little benefit but make life harder for matthieu@.

If the typo still exists upstream, please report it upstream,
and it will automatically appear in OpenBSD with the next regular
update.

Yours,
  Ingo


Alejandro G. Peregrina wrote on Fri, Mar 08, 2019 at 01:04:10PM +0100:

> Index: xman.man
> ===
> RCS file: /cvs/xenocara/app/xman/man/xman.man,v
> retrieving revision 1.2
> diff -u -p -u -r1.2 xman.man
> --- xman.man  28 Sep 2013 16:23:02 -  1.2
> +++ xman.man  8 Mar 2019 11:34:51 -
> @@ -152,7 +152,7 @@ n(n) New
>  o(o) Old
>  
>  .fi
> -Xman will read any section that is of the from \fIman\fP, where
> +Xman will read any section that is of the form \fIman\fP, where
>   is an upper or lower case letter (they are treated distinctly) or
>  a numeral (0-9).  Be warned, however, that man(__appmansuffix__) and
>  catman(__adminmansuffix__) will not search directories that are non-standard.



Re: xterm and wcwidth()

2019-03-08 Thread Ingo Schwarze
Hi,

Lauri Tirkkonen wrote on Fri, Mar 08, 2019 at 12:43:11PM +0200:

> I feel like xterm should just use the system wcwidth() to avoid these
> mismatches, so rudimentary diff to do that below.

Absolutely, i strongly agree with that sentiment.

Having a local character width table in an application program is
just wrong.  Archaic operating systems like Oracle Solaris 11 that
have broken wcwidth(3) implementations should either fix their C
library or go to hell.

I also agree with the details of the diff.  It is minimal in so far
as it changes only one file, keeping the annoyance minimal when merging
new xterm releases.  The function systemWcwidthOk() is horrific
enough that deleting it outright makes sense, to make sure that it
can never get called by accident.

Note that the command line options -cjk_width and -mk_width and the
related resources remain broken.  But i'd say people using them get
what they deserve.  The disruption to updates that would be caused
by excising them from multiple files is probably better avoided.

Unchanged diff left in place below for easy reference.

OK to commit?
  Ingo


diff --git a/app/xterm/util.c b/app/xterm/util.c
index a3de4a005..9b6443683 100644
--- a/app/xterm/util.c
+++ b/app/xterm/util.c
@@ -4980,61 +4980,6 @@ decode_keyboard_type(XtermWidget xw, XTERM_RESOURCE * rp)
 }
 
 #if OPT_WIDE_CHARS
-#if defined(HAVE_WCHAR_H) && defined(HAVE_WCWIDTH)
-/*
- * If xterm is running in a UTF-8 locale, it is still possible to encounter
- * old runtime configurations which yield incomplete or inaccurate data.
- */
-static Bool
-systemWcwidthOk(int samplesize, int samplepass)
-{
-wchar_t n;
-int oops = 0;
-
-for (n = 21; n <= 25; ++n) {
-   wchar_t code = (wchar_t) dec2ucs(NULL, (unsigned) n);
-   int system_code = wcwidth(code);
-   int intern_code = mk_wcwidth(code);
-
-   /*
-* Solaris 10 wcwidth() returns "2" for all of the line-drawing (page
-* 0x2500) and most of the geometric shapes (a few are excluded, just
-* to make it more difficult to use).  Do a sanity check to avoid using
-* it.
-*/
-   if ((system_code < 0 && intern_code >= 1)
-   || (system_code >= 0 && intern_code != system_code)) {
-   TRACE(("systemWcwidthOk: broken system line-drawing wcwidth\n"));
-   oops += (samplepass + 1);
-   break;
-   }
-}
-
-for (n = 0; n < (wchar_t) samplesize; ++n) {
-   int system_code = wcwidth(n);
-   int intern_code = mk_wcwidth(n);
-
-   /*
-* When this check was originally implemented, there were few if any
-* libraries with full Unicode coverage.  Time passes, and it is
-* possible to make a full comparison of the BMP.  There are some
-* differences: mk_wcwidth() marks some codes as combining and some
-* as single-width, differing from GNU libc.
-*/
-   if ((system_code < 0 && intern_code >= 1)
-   || (system_code >= 0 && intern_code != system_code)) {
-   TRACE((".. width(U+%04X) = %d, expected %d\n",
-  (unsigned) n, system_code, intern_code));
-   if (++oops > samplepass)
-   break;
-   }
-}
-TRACE(("systemWcwidthOk: %d/%d mismatches, allowed %d\n",
-  oops, (int) n, samplepass));
-return (oops <= samplepass);
-}
-#endif /* HAVE_WCWIDTH */
-
 void
 decode_wcwidth(XtermWidget xw)
 {
@@ -5045,8 +4990,7 @@ decode_wcwidth(XtermWidget xw)
 switch (mode) {
 default:
 #if defined(HAVE_WCHAR_H) && defined(HAVE_WCWIDTH)
-   if (xtermEnvUTF8() &&
-   systemWcwidthOk(xw->misc.mk_samplesize, xw->misc.mk_samplepass)) {
+   if (xtermEnvUTF8()) {
my_wcwidth = wcwidth;
TRACE(("using system wcwidth() function\n"));
break;



Re: xterm and wcwidth()

2019-03-08 Thread Ingo Schwarze
Hi,

Ted Unangst wrote on Fri, Mar 08, 2019 at 03:24:56PM -0500:
> Matthieu Herrb wrote:

>> I would prefer a diff that just add a &&!defined(__OpenBSD__) to the
>> condition before the definition of systemWcwidthOk(). This will cause
>> less risk of conflicts in future updates and clearly show the
>> intention.

> If you prefer that, I would suggest the following to avoid patching multiple
> places. Just short circuit the function.

Makes sense, i committed that version.

Thanks for reporting and for all the feedback,
  Ingo

>>>  #if OPT_WIDE_CHARS
>>> -#if defined(HAVE_WCHAR_H) && defined(HAVE_WCWIDTH)
>>> -/*
>>> - * If xterm is running in a UTF-8 locale, it is still possible to encounter
>>> - * old runtime configurations which yield incomplete or inaccurate data.
>>> - */
>>> -static Bool
>>> -systemWcwidthOk(int samplesize, int samplepass)
>>> -{
>>> -wchar_t n;
>>> -int oops = 0;

> #ifdef __OpenBSD__
>   return 1;
> #endif
 
>>> -
>>> -for (n = 21; n <= 25; ++n) {
>>> -   wchar_t code = (wchar_t) dec2ucs(NULL, (unsigned) n);
>>> -   int system_code = wcwidth(code);
>>> -   int intern_code = mk_wcwidth(code);



Re: xterm and wcwidth()

2019-03-08 Thread Ingo Schwarze
Hi,

Ted Unangst wrote on Fri, Mar 08, 2019 at 12:57:17PM -0500:
> Lauri Tirkkonen wrote:

>> in other words, xterm is (and was) using its own idea of how many
>> columns characters take up. The way this manifested itself to me was
>> that I received some email that contained emoji characters in the
>> subject, and they look fine in mutt when I use another terminal (st on
>> OpenBSD, or termux on my Android phone), but on 'xterm -fa "DejaVu Sans
>> Mono"' on OpenBSD, only the left half of these characters is rendered,
>> and mutt's index background color indicating the selection doesn't reach
>> the right edge of the screen: libc wcwidth() returns 2, but xterm's idea
>> seems to be 1 column.

> Sigh, here we go again. Some systems have a bad function, so let's replace it
> with a slightly better version, and then never update it. If there are bugs in
> the libc wcwidth function, we'd very much like to know and fix them, not have
> these workarounds.

I'm counting that as OK tedu@, but i'll wait a bit before committing
in case matthieu@ (or someone else) wants to speak up.

Yours,
  Ingo



Re: EC_POINT_new.3 glitch

2019-03-18 Thread Ingo Schwarze
Hi Jan,

Jan Stary wrote on Mon, Mar 18, 2019 at 08:38:04AM +0100:

> This seems to be a missed newline.

Committed, thanks.
  Ingo


> Index: EC_POINT_new.3
> ===
> RCS file: /cvs/src/lib/libcrypto/man/EC_POINT_new.3,v
> retrieving revision 1.9
> diff -u -p -r1.9 EC_POINT_new.3
> --- EC_POINT_new.329 Mar 2018 20:56:49 -  1.9
> +++ EC_POINT_new.318 Mar 2019 07:36:50 -
> @@ -472,7 +472,8 @@ on error.
>  .Pp
>  .Fn EC_POINT_hex2point
>  returns the pointer to the
> -.Vt EC_POINT supplied or
> +.Vt EC_POINT
> +supplied or
>  .Dv NULL
>  on error.
>  .Sh SEE ALSO



Re: Unescaped backslashes in manual pages

2019-03-19 Thread Ingo Schwarze
Hi Peter,

Peter Piwowarski wrote on Tue, Mar 19, 2019 at 10:03:20PM -0400:

> qsort(3) has in its EXAMPLE a C \n with its backslash unescaped; a quick
> grep through manpages showed similar errors in CONF_modules_load_file.3.

Ouch!  Thanks for the report, committed.
  Ingo


> Index: lib/libc/stdlib/qsort.3
> ===
> RCS file: /cvs/src/lib/libc/stdlib/qsort.3,v
> retrieving revision 1.24
> diff -u -p -r1.24 qsort.3
> --- lib/libc/stdlib/qsort.3   22 Jan 2019 06:49:17 -  1.24
> +++ lib/libc/stdlib/qsort.3   19 Mar 2019 16:49:59 -
> @@ -179,7 +179,7 @@ main()
>  
>   qsort(array, N, sizeof(array[0]), cmp);
>   for (i =  0; i < N; i++)
> - printf("%s\n", array[i]);
> + printf("%s\en", array[i]);
>  }
>  
>  
> Index: lib/libcrypto/man/CONF_modules_load_file.3
> ===
> RCS file: /cvs/src/lib/libcrypto/man/CONF_modules_load_file.3,v
> retrieving revision 1.7
> diff -u -p -r1.7 CONF_modules_load_file.3
> --- lib/libcrypto/man/CONF_modules_load_file.322 Mar 2018 21:08:22 
> -  1.7
> +++ lib/libcrypto/man/CONF_modules_load_file.319 Mar 2019 16:49:59 
> -
> @@ -163,7 +163,7 @@ Load a configuration file and print out 
>  file considered fatal):
>  .Bd -literal
>  if (CONF_modules_load_file(NULL, NULL, 0) <= 0) {
> - fprintf(stderr, "FATAL: error loading configuration file\n");
> + fprintf(stderr, "FATAL: error loading configuration file\en");
>   ERR_print_errors_fp(stderr);
>   exit(1);
>  }
> @@ -174,7 +174,7 @@ by "myapp", tolerate missing files, but 
>  .Bd -literal
>  if (CONF_modules_load_file(NULL, "myapp",
>  CONF_MFLAGS_IGNORE_MISSING_FILE) <= 0) {
> - fprintf(stderr, "FATAL: error loading configuration file\n");
> + fprintf(stderr, "FATAL: error loading configuration file\en");
>   ERR_print_errors_fp(stderr);
>   exit(1);
>  }
> @@ -185,7 +185,7 @@ error, missing configuration file ignore
>  .Bd -literal
>  if (CONF_modules_load_file("/something/app.cnf", "myapp",
>  CONF_MFLAGS_IGNORE_MISSING_FILE) <= 0) {
> - fprintf(stderr, "WARNING: error loading configuration file\n");
> + fprintf(stderr, "WARNING: error loading configuration file\en");
>   ERR_print_errors_fp(stderr);
>  }
>  .Ed
> @@ -198,17 +198,17 @@ long eline;
>  
>  fp = fopen("/somepath/app.cnf", "r");
>  if (fp == NULL) {
> - fprintf(stderr, "Error opening configuration file\n");
> + fprintf(stderr, "Error opening configuration file\en");
>   /* Other missing configuration file behaviour */
>  } else {
>   cnf = NCONF_new(NULL);
>   if (NCONF_load_fp(cnf, fp, ) == 0) {
> - fprintf(stderr, "Error on line %ld of configuration file\n",
> + fprintf(stderr, "Error on line %ld of configuration file\en",
>   eline);
>   ERR_print_errors_fp(stderr);
>   /* Other malformed configuration file behaviour */
>   } else if (CONF_modules_load(cnf, "appname", 0) <= 0) {
> - fprintf(stderr, "Error configuring application\n");
> + fprintf(stderr, "Error configuring application\en");
>   ERR_print_errors_fp(stderr);
>   /* Other configuration error behaviour */
>   }



Re: Adding fstab(5) fs_mntops to mount_nfs(8)

2019-03-11 Thread Ingo Schwarze
Hi Kurt,

Kurt Mosiejczuk wrote on Mon, Mar 11, 2019 at 09:32:41AM -0400:

> As far as I can tell, no man page documents the fs_mntops form of the
> mount_nfs(8) options for fstab(5). Most of us who use them either know
> from some other documentation, reading the source, or just doing a
> copy-paste from somwhere.
> 
> I went through the source and wrote them down, and here is a first stab
> at documenting them in mount_nfs(8).

I like the general idea, but have lots of comments regarding specific
details.

> I can't really think of a better place to put this information than
> mount_nfs(8), but welcome suggestions. I also welcome feedback on how
> best to push this information into the man page.

I can't think of a better place than mount_nfs(8) either.

One aspect that i overlooked when we initially discussed this
(because i didn't actually read the code back then, i only looked
at it right now checking the details): the fstab(5) options can
also be given via -o and vice versa, and that ought to be documented.
Equivalence is guaranteed because mount(8) actually forks and execs
mount_nfs(8), passing the data taken from fstab(5) via argv[].

So i suggest the following style:

  .It Fl 2 No or Fl o Cm nfsv2
  Use the NFS Version 2 protocol.

Regarding the command line options taking arguments (-agIRrtwx):
It appears these can be specified as

  mount -o -t=...

on the command line or as

  ... nfs ro,-t=...

in fstab(5).  According to my testing (and my reading of the code),
the forms

  mount -o timeo=...
  ... nfs ro,timeo=...

that you are documenting do *not* work.  I suspect you confused the
field names in the struct nfs_args in the source code with the
option names to be used in -o/fstab(5).

So, we have four classes of options:

 * no arguments, equivalent -o/fstab(5) option exists
-2  =  -o nfsv2   /* wrong name in your diff? */
-3  =  -o nfsv3   /* wrong name in your diff? */
-b  =  -o bg
-c  =  -o noconn
-d  =  -o dumbtimer   /* missing in your diff */
-i  =  -o intr
-l  =  -o rdirplus/* wrong name in your diff? */
-s  =  -o soft
-T  =  -o tcp
-U  =  -o mntudp  /* wrong name in your diff? */

 * no arguments, can be specified with -o/fstab(5) only:

   -o resvport/* missing in your diff */
   -o noac

 * takes an argument, no explicit -o equivalent
-a arg  =  -o -a=arg  /* wrong name in your diff */
-g arg  =  -o -g=arg  /* wrong name in your diff */
-I arg  =  -o -I=arg  /* wrong name in your diff */
-R arg  =  -o -R=arg  /* wrong name in your diff */
-r arg  =  -o -r=arg  /* wrong name in your diff */
-t arg  =  -o -t=arg  /* wrong name in your diff */
-w arg  =  -o -w=arg  /* wrong name in your diff */
-x arg  =  -o -x=arg  /* wrong name in your diff */

 * takes an argument, can be specified with -o/fstab(5) only:

   -o port=arg
   -o acdirmax=arg
   -o acdirmin=arg
   -o acregmax=arg
   -o acregmin=arg

Below -o, it should probably be said that all other options described
in mount_nfs(8)

 1. can be passed to mount(8) like "mount -o -b,-t=timeout"
or like "mount -o bg,-t=timeout"

 2. and can be specified in fstab(5) like
... /mountpoint nfs options,-b,-t=timeout ...
or like
... /mountpoint nfs options,bg,-t=timeout ...

This is not intended as a suggestion for the wording, but only
as an indication of the content that probably needs to be conveyed.

> Index: mount_nfs.8
> ===
> RCS file: /cvs/src/sbin/mount_nfs/mount_nfs.8,v
> retrieving revision 1.39
> diff -u -p -r1.39 mount_nfs.8
> --- mount_nfs.8   6 Jun 2009 20:58:50 -   1.39
> +++ mount_nfs.8   10 Mar 2019 19:28:01 -
> @@ -66,27 +66,48 @@ It implements the mount protocol as desc
>  .%T "NFS: Network File System Version 3 Protocol Specification" ,
>  Appendix I.
>  .Pp
> -The options are as follows:
> +The options are as follows.
> +Many of them can be specified permanently in the
> +.Fa fs_mntopts
> +field of the 
> +.Xr fstab 5
> +configuration file for use with the one-argument form of
> +.Xr mount 8 .

Even though i suggested this wording, this change should probably
be reverted, given the above analysis.

I think you need to rework you diff.

Yours,
  Ingo



less(1) UTF-8 cleanup: do_append()

2019-03-12 Thread Ingo Schwarze
Hi Nicholas,

Nicholas Marriott wrote on Tue, Mar 12, 2019 at 11:05:07AM +:

> Sorry, ok nicm if you haven't committed this already.

No i hadn't, but now i have, thank you very much for checking the patch.

Here is the next step, cleaning up most parts of the crucial function
do_append() which decides how to handle backspaces and overstrikes,
in particular what to print literally and what to encode.

Four bad function calls have to be replaced here:

 1. The call to the bad function get_wchar() is used to find the
character already present at the current position.  Replacing
it with mbtowc(3) also eliminates LWCHAR.  In case of failure -
which may no be likely since at least usually, the linebuf[]
will contain valid UTF-8 - setting prev_ch to NUL makes sure
that whatever is already there will simply be replaced.
I think linebuf[curr] cannot be NUL at this point because only
backc() sets overstrike, and just having backed up, *something*
will be there.  But even if linebuf[curr] *is* NUL and hence
mbtowc(3) returns 0, the new code should do the right thing and
simply append.

 2. The calls to the bad functions is_composing_char() and
is_combining_char() have to be replaced with wcwidth(3).
That also eliminates the second call to get_wchar().
If wcwidth(3) fails (i.e. ch is not printable), we simply have
to treat it like a width 1 character.

 3. The call to the bad function control_char() has to be eliminated.
Start by considering two cases separately.  In utf_mode, we
have is_ascii_char(ch), i.e. ch <= 0x7f.  In that case,
control_char() is just iscntrl(3), which is identical to
!isprint(3).  In !utf_mode, we know that do_append() only gets
called with ch <= 0xff, and control_char() is iscntrl(3) ||
!isprint(3).  In that case and expression, the iscntrl(3) is
obviously redundant.  So we can simply use !isprint(3) for both
cases, which is also a logical way of expressing the condition
because this "else if" clause is supposed to handle non-printable
single-byte characters.

 4. The call to the bad function is_ubin_char() intends to handle
non-printable Unicode characters, so the right function to use
is simply !iswprint(3) from .

One part of do_append() still remains broken: It still calls
prutfchar() which is broken because it calls is_ubin_char() and
control_char(), because it accepts some invalid input, and because
it generates some invalid output.  But cleaning that up is a separate
matter.

Here are byte sequences i tested:

 * ASCII char backspace same ASCII char again -> bold
 * underscore backspace underscore in between -> bold
 * ASCII char backspace underscore-> italic
 * underscore backspace underscore in between -> italic
 * underscore backspace ASCII char-> italic
 * UTF-8 character backspace same UTF-8 again -> bold
 * UTF-8 character backspace undercore-> italic
 * underscore backspace UTF-8 character   -> italic
 * invalid byte backspace underscore  -> <88>^H_
 * ASCII char UTF-8 combining accent backspace
   same ASCII char and combining accent again -> bold
 * underscore backspace ASCII char
   UTF-8 combining accent -> italic
 * UTF-8 character UTF-8 combining accent
   backspace same UTF-8 character
   and combining accent again -> bold
 * underscore backspace UTF-8 character
   UTF-8 combining accent -> italic
 * ASCII char UTF-8 combining accent backspace
   non-printable ASCII char
   -> accented char invisible, overwritten by the encoded
  representation of the non-printable char
 * ASCII char UTF-8 combining accent backspace
   non-printable UTF-8 character
   -> accented char invisible, overwritten by the encoded
  representation of the non-printable character
 * ASCII controls show as caret-letter
 * UTF-8 controls show as 

I tested both with LC_CTYPE=en_US.UTF-8 and LC_CTYPE=C.

Note the following case is dubious:

 * ASCII char UTF-8 combining accent backspace underscore
   Arguably, this should underline the accented character,
   but actually, the accent disappears with both the old
   and the new code because it is overwritten by the next
   character.  I'm not trying to fix that with this patch.
 * same for UTF-8 character UTF-8 combining accent backspace underscore

OK?
  Ingo


Index: line.c
===
RCS file: /cvs/src/usr.bin/less/line.c,v
retrieving revision 1.27
diff -u -p -r1.27 line.c
--- line.c  12 Mar 2019 11:59:24 -  1.27
+++ line.c  12 Mar 2019 14:33:55 -
@@ -16,6 +16,7 @@
  */
 
 #include 
+#include 
 
 #include "charset.h"
 #include "less.h"
@@ -774,8 +775,8 @@ retry:
 static int
 do_append(LWCHAR ch, char *rep, off_t pos)
 {
+   wchar_t prev_ch;
int a;
-   LWCHAR prev_ch;
 
a = AT_NORMAL;
 
@@ -813,7 +814,10 @@ do_append(LWCHAR ch, char *rep, 

delete useless setlocale(3) in spell(1)

2019-03-09 Thread Ingo Schwarze
Hi,

looking through my trees for diffs that were forgotten, i found this one.

Spell uses nothing locale-dependent, and it will never need support for
non-English characters because the basic algorithms used are specific
to the English language in the first place.

So delete  and setlocale(3) without functional change.

While here, make usage() static and return() from main rather than exit(3).

Minimially tweaked diff originally from Jan Stary.

OK?
  Ingo


Index: spellprog.c
===
RCS file: /cvs/src/usr.bin/spell/spellprog.c,v
retrieving revision 1.12
diff -u -p -r1.12 spellprog.c
--- spellprog.c 10 Oct 2015 19:11:04 -  1.12
+++ spellprog.c 28 Oct 2016 07:53:39 -
@@ -72,7 +72,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -108,7 +107,7 @@ char*estrdup(const char *);
 voidise(void);
 voidprint_word(FILE *);
 voidztos(char *);
-__dead void usage(void);
+static void __dead usage(void);
 
 /* from look.c */
 int look(unsigned char *, unsigned char *, unsigned char *);
@@ -247,8 +246,6 @@ main(int argc, char **argv)
struct stat sb;
FILE *file, *found;
 
-   setlocale(LC_ALL, "");
-
if (pledge("stdio rpath wpath cpath", NULL) == -1)
err(1, "pledge");
 
@@ -316,7 +313,7 @@ main(int argc, char **argv)
if (ch == EOF) {
if (found != NULL)
fclose(found);
-   exit(0);
+   return (0);
}
}
for (cp = word, dp = original; cp < ep; )
@@ -345,7 +342,7 @@ lcase:
file = stdout;
}
 
-   exit(0);
+   return (0);
 }
 
 void
@@ -788,7 +785,7 @@ dict(char *bp, char *ep)
return (rval);
 }
 
-__dead void
+static void __dead
 usage(void)
 {
extern char *__progname;



delete useless setlocale(3) in uuencode(1)

2019-03-09 Thread Ingo Schwarze
Hi,

uuencode(1) and uudecode(1) do nothing locale-related and i don't see
how they ever might.

So delete  and setlocale(3).

While here, sort headers, make usage() static, return from main
rather than exit(3), and drop two redundant case statements.

This is also a minimally tweaked version of a patch from Jan Stary.

OK?
  Ingo


Index: uudecode/uudecode.c
===
RCS file: /cvs/src/usr.bin/uudecode/uudecode.c,v
retrieving revision 1.23
diff -u -p -r1.23 uudecode.c
--- uudecode/uudecode.c 3 Jan 2016 14:43:20 -   1.23
+++ uudecode/uudecode.c 28 Oct 2016 08:34:42 -
@@ -43,20 +43,19 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
 
 static const char *infile, *outfile;
 static FILE *infp, *outfp;
 static int base64, cflag, iflag, oflag, pflag, rflag, sflag;
 
-static voidusage(void);
+static void __dead usage(void);
 static int decode(void);
 static int decode2(void);
 static int uu_decode(void);
@@ -83,7 +82,6 @@ main(int argc, char *argv[])
pmode = MODE_B64DECODE;
}
 
-   setlocale(LC_ALL, "");
while ((ch = getopt(argc, argv, optstr[pmode])) != -1) {
switch(ch) {
case 'c':
@@ -154,7 +152,7 @@ main(int argc, char *argv[])
infp = stdin;
rval = decode();
}
-   exit(rval);
+   return (rval);
 }
 
 static int
@@ -442,7 +440,7 @@ base64_decode(void)
"error decoding base64 input stream"));
 }
 
-static void
+static void __dead
 usage(void)
 {
switch (pmode) {
Index: uuencode/uuencode.c
===
RCS file: /cvs/src/usr.bin/uuencode/uuencode.c,v
retrieving revision 1.13
diff -u -p -r1.13 uuencode.c
--- uuencode/uuencode.c 9 Oct 2015 01:37:09 -   1.13
+++ uuencode/uuencode.c 28 Oct 2016 08:34:42 -
@@ -40,7 +40,6 @@
 #include 
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -49,7 +48,7 @@
 
 void encode(void);
 void base64_encode(void);
-static void usage(void);
+static void __dead usage(void);
 
 FILE *output;
 int mode;
@@ -81,7 +80,6 @@ main(int argc, char *argv[])
pmode = MODE_B64ENCODE;
}
 
-   setlocale(LC_ALL, "");
while ((ch = getopt(argc, argv, optstr[pmode])) != -1) {
switch (ch) {
case 'm':
@@ -90,7 +88,6 @@ main(int argc, char *argv[])
case 'o':
outfile = optarg;
break;
-   case '?':
default:
usage();
}
@@ -118,7 +115,6 @@ main(int argc, char *argv[])
 #defineRW  (S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH)
mode = RW & ~umask(RW);
break;
-   case 0:
default:
usage();
}
@@ -137,7 +133,7 @@ main(int argc, char *argv[])
encode();
if (ferror(output))
errx(1, "write error");
-   exit(0);
+   return 0;
 }
 
 /* ENC is the basic 1 character encoding function to make a char printing */
@@ -220,7 +216,7 @@ encode(void)
(void)fprintf(output, "%c\nend\n", ENC('\0'));
 }
 
-static void
+static void __dead
 usage(void)
 {
switch (pmode) {



delete useless setlocale(3) from touch(1)

2019-03-09 Thread Ingo Schwarze
Hi,

i initially thought touch(1) might be tricky because it is
concerned with times, but it turned out is does almost all
of the parsing by hand, supporting only specific numeric formats
and nothing like month names.

It does use strptime(3), but only for %F and %T neither of
which is locale-dependent.

Careful line-by-line review of the code turned up nothing
that might use the locale, not even LC_TIME.

OK for Jan Stary's patch?
  Ingo


Index: touch.c
===
RCS file: /cvs/src/usr.bin/touch/touch.c,v
retrieving revision 1.25
diff -u -p -r1.25 touch.c
--- touch.c 9 Oct 2015 01:37:09 -   1.25
+++ touch.c 28 Oct 2016 09:28:29 -
@@ -41,7 +41,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -49,7 +48,7 @@ void  stime_arg1(char *, struct timespec
 void   stime_arg2(char *, int, struct timespec *);
 void   stime_argd(char *, struct timespec *);
 void   stime_file(char *, struct timespec *);
-__dead voidusage(void);
+static void __dead usage(void);
 
 int
 main(int argc, char *argv[])
@@ -58,8 +57,6 @@ main(int argc, char *argv[])
int  aflag, cflag, mflag, ch, fd, len, rval, timeset;
char*p;
 
-   (void)setlocale(LC_ALL, "");
-
if (pledge("stdio rpath wpath cpath fattr", NULL) == -1)
err(1, "pledge");
 
@@ -145,7 +142,7 @@ main(int argc, char *argv[])
warn("%s", *argv);
}
}
-   exit(rval);
+   return rval;
 }
 
 #defineATOI2(s)((s) += 2, ((s)[-2] - '0') * 10 + ((s)[-1] - 
'0'))
@@ -324,7 +321,7 @@ terr:   errx(1,
tsp[1] = tsp[0];
 }
 
-__dead void
+static void __dead
 usage(void)
 {
(void)fprintf(stderr,



Re: games/fortune translation fix

2019-02-06 Thread Ingo Schwarze
Hi Scott,

Scott Cheloha wrote on Tue, Feb 05, 2019 at 12:02:39AM -0600:

> Oof, folks I think we've missed the forest for the trees here.
> 
> By focussing on the minutiae of the Latin translation we've discarded
> the English motto ("through  to the stars") that imho
> anchored the whole thing.
> 
> As noted previously, this phrase was a modern invention with a hasty
> Latin translation conscripted to it to add some credence so you could
> sew it onto a vest or stamp it onto a medal.
> 
> The intent of the whole thing was a poetic English verse, not an
> accurate Latin translation.  The Latin was subordinate, as evidenced
> by the difficult translation.  The intended audience spoke English.
> 
> But okay, because it isn't attributed to anyone or anything in our
> file I can see how we focussed in on that and worked from the Latin
> toward a more correct English translation.  And translation is
> puzzle-solving and puzzle-solving is fun.
> 
> I empathize.
> 
> But... I humbly suggest we partially revert and instead add an
> attribution.

I tried to find the original source but couldn't.  So unless somebody
finds out who originally coined the phrase and for which purpose,
giving an attribution looks like a thoroughly bad idea to me.

> Some quick googling suggests "Per ardua ad astra"

That's not even the same motto.

Are you arguing that a different phrase should be cited?
If so, why?
If anything, "per aspera" appears to be more widely used,
and less exclusively focussed on a military meaning.

> -Per aspera ad astra.  (Through hardship to immortality.)
> +Per ardua ad astra.  (Through adversity to the stars.)
> + -- Motto of the Royal Air Force

I dislike that for the reasons stated above.

If somebody wants to continue this discussion, it should probably
be moved to misc@.

Yours,
  Ingo



Re: install(1) could fail due to race

2019-02-08 Thread Ingo Schwarze
Ted Unangst wrote on Sun, Jan 27, 2019 at 10:37:52AM -0500:
> Ingo Schwarze wrote:

>> If people here agree with the general direction of making -S the
>> default and removing the fragile non-S mode (see the patch below),
>> i'll run a full make build and make release and then ask for OKs.

> Just checking we didn't forget about this. Seems the right thing to do.

Committed now, hoping that it will *not* cause fireworks in deraadt@'s
NFS-based build infrastructure.

Yours,
  Ingo



Re: install(1) could fail due to race

2019-02-06 Thread Ingo Schwarze
Hi Ted,

Ted Unangst wrote on Sun, Jan 27, 2019 at 10:37:52AM -0500:
> Ingo Schwarze wrote:

>> If people here agree with the general direction of making -S the
>> default and removing the fragile non-S mode (see the patch below),
>> i'll run a full make build and make release and then ask for OKs.

> Just checking we didn't forget about this. Seems the right thing to do.

So, i finally came round to doing a full make build and make release
for both base and xenocara on amd64, and everything succeeded.

Consequently, i'm now looking for an explicit OK.

Appending the full patch again for ease of reference.

Yours,
  Ingo


Index: install.1
===
RCS file: /cvs/src/usr.bin/xinstall/install.1,v
retrieving revision 1.30
diff -u -p -r1.30 install.1
--- install.1   13 May 2016 17:51:15 -  1.30
+++ install.1   6 Feb 2019 23:50:46 -
@@ -101,7 +101,7 @@ Create directories.
 Missing parent directories are created as required.
 This option cannot be used with the
 .Fl B , b , C , c ,
-.Fl f , p , S ,
+.Fl f , p ,
 or
 .Fl s
 options.
@@ -141,15 +141,12 @@ except if the target file doesn't alread
 then preserve the modification time of the file.
 .It Fl S
 Safe copy.
-Normally,
-.Nm
-unlinks an existing target before installing the new file.
-With the
-.Fl S
-flag a temporary file is used and then renamed to be
-the target.
-The reason this is safer is that if the copy or
-rename fails, the existing target is left untouched.
+This is the default.
+This option has no effect and is supported only for compatibility.
+When installing a file, a temporary file is created and written first
+in the destination directory, then atomically renamed.
+This avoids both race conditions and the destruction of existing
+files in case of write failures.
 .It Fl s
 .Nm
 exec's the command
@@ -186,18 +183,8 @@ Default is
 .Sh FILES
 .Bl -tag -width INS@XX -compact
 .It Pa INS@XX
-If either
-.Fl S
-option is specified, or the
-.Fl C
-or
-.Fl p
-option is used in conjunction with the
-.Fl s
-option, temporary files named INS@XX,
-where XX is decided by
-.Xr mkstemp 3 ,
-are created in the target directory.
+Temporary files created in the target directory by
+.Xr mkstemp 3 .
 .El
 .Sh EXIT STATUS
 .Ex -std install
Index: xinstall.c
===
RCS file: /cvs/src/usr.bin/xinstall/xinstall.c,v
retrieving revision 1.67
diff -u -p -r1.67 xinstall.c
--- xinstall.c  16 Sep 2018 02:44:07 -  1.67
+++ xinstall.c  6 Feb 2019 23:50:46 -
@@ -60,7 +60,7 @@
 #define NOCHANGEBITS   (UF_IMMUTABLE | UF_APPEND | SF_IMMUTABLE | SF_APPEND)
 #define BACKUP_SUFFIX  ".old"
 
-int dobackup, docompare, dodest, dodir, dopreserve, dostrip, safecopy;
+int dobackup, docompare, dodest, dodir, dopreserve, dostrip;
 int mode = S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH;
 char pathbuf[PATH_MAX], tempfile[PATH_MAX];
 char *suffix = BACKUP_SUFFIX;
@@ -73,7 +73,6 @@ void  install(char *, char *, u_long, u_i
 void   install_dir(char *, int);
 void   strip(char *);
 void   usage(void);
-intcreate_newfile(char *, struct stat *);
 intcreate_tempfile(char *, char *, size_t);
 intfile_write(int, char *, size_t, int *, int *, int);
 void   file_flush(int, int);
@@ -129,7 +128,7 @@ main(int argc, char *argv[])
docompare = dopreserve = 1;
break;
case 'S':
-   safecopy = 1;
+   /* For backwards compatibility. */
break;
case 's':
dostrip = 1;
@@ -148,17 +147,13 @@ main(int argc, char *argv[])
argv += optind;
 
/* some options make no sense when creating directories */
-   if ((safecopy || docompare || dostrip) && dodir)
+   if ((docompare || dostrip) && dodir)
usage();
 
/* must have at least two arguments, except when creating directories */
if (argc < 2 && !dodir)
usage();
 
-   /* need to make a temp copy so we can compare stripped version */
-   if (docompare && dostrip)
-   safecopy = 1;
-
/* get group and owner id's */
if (group != NULL && gid_from_group(group, ) == -1) {
gid = strtonum(group, 0, GID_MAX, );
@@ -265,54 +260,30 @@ install(char *from_name, char *to_name, 
err(1, "%s", from_name);
}
 
-   if (safecopy) {
-   to_fd = create_tempfile(to_name, tempfile, sizeof(tempfile));
-   if (to_fd < 0)
-   err(1, "%s", tempfile);
-   } else if (docompare && !dostrip) {
-   if ((to_fd = open(to_name, O_RDONLY, 0)) < 0)
-   err(1, "%s", to_name);
-   } else {
-

Re: ASN1_get_object.3

2019-01-30 Thread Ingo Schwarze
Hi Kristaps,

Kristaps Dzonsons BSD.LV wrote on Wed, Jan 30, 2019 at 12:17:16PM +0100:

>> Attached is a manpage for ASN1_get_object(3), ASN1_put_object(3), and
>> ASN1_put_eoc(3).

> This is not complete without ASN1_object_size(), as otherwise one can't
> scan over children, which is necessary when parsing anything but
> trivially nested objects.  This is because the actual length of an ASN.1
> object depends upon... a lot of stuff.

Are you sure that you shouldn't be using the documented functions

  d2i_ASN1_TYPE(1)
  d2i_ASN1_OBJECT(1)

instead?

I'm not at all an ASN.1 wizard, but from a casual look, it seems
to me that maybe d2i_ASN1_TYPE(3) handles some of the complications
that ASN1_get_object(1) apparently causes - even though d2i_ASN1_TYPE(3)
is most definitely horribly designed, too, given that CAVEATS and
BUGS combined are longer than the DESCRIPTION.  Maybe ASN1_get_object()
is intentionally undocumented and only contained in 
by accident?

I'm not saying i'm opposed to documenting the function(s), but i'd
like input from a LibreSSL ASN.1 wizard which interface(s) application
software should use.  Maybe tedu@ qualifies, not sure, he himself
will know...  :-)

*If* we come to the conclusion that these functions should be
documented rather than deprecated, then i think from just reading
the first paragraph of the DESCRIPTION in d2i_ASN1_TYPE(3) and
ASN1_get_object(3), it ought to be crystal clear what the difference
is - whereas right now, both say about the same.

*If* we come to the conclusion that these functions should be
documented, you should probably send an updated draft taking tedu@'s
feedback into account.

Yours,
  Ingo



Re: games/fortune translation fix

2019-02-02 Thread Ingo Schwarze
Hi Jason,

oh well, these files are a mess, a random collection of funny
and not so funny stuff...  I dislike this one, too, for several
reasons.

 1. While "ad astra per aspera" sometimes occurs, the word order
"per aspera ad astra" is much more commonly used.  It sounds
much better - not only because it respects the logical
chronological order, but even more so for metric reasons:
"per aspera ad astra" follows a loosely trochaic rhythm
" - | X- (x)-X- " while"ad astra per aspera" has no
discernible metric whatsoever: " - | X- | -  X--   ".

 2. It doesn't actually appear to be an antique or even medieval
latin proverb but merely a modern invention.

 3. According to my latin dictionary, "ad astra" did occur in
antiquity, in a strongly metaphoric sense where "astra = heaven,
immortality, sphere and home of the gods" rather than "stars" -
though the only source cited there is Vergilius, Aeneis:
Macte nova virtute, puer, sic itur ad astra.
  literal (hence somewhat misleading) translation:
Blessings on your young courage, boy; that's the way to the stars.
  free translation, capturing the meaning and register better:
Go on and increase in valor, O boy! this is the path to immortality.
See e.g. https://en.wikiquote.org/wiki/Aeneid
Seneca also appears to use it in the context of a demi-god
(Hercules) interacting with the Olympians.

So "ad astra" appears to be a rather unsual phrase, highly poetic,
suited to heroic legends about gods and super-human men interacting
directly with the greatest gods.

 4. According to my latin dictionary, "asper" is a very common
adjective with a wide range of everyday concrete meanings:
rough, uneven, sharp, coarse, gruff, wild, disagreeable,
sorrowful, severe, rancorous, ... and more.
It was sometimes - but much less commonly, it appears -
used as a noun, though usually with a qualification in the
genitive, e.g. "aspera maris" = tempests (Tacitus).
When used alone as a noun, it appears to have a relatively
narrow, figurative, relatively mild meaning of "tribulations,
annoying trouble, misfortune"; translations like "hardship" or
"adversity" appear to exceed the severity of "asper(a)",
making it sound too much like serious emergency and distress.

So a fitting translation of "per aspera ad astra", approximating
the meaning and stylistic register of both parts, might be

  "overcoming annoying problems to be elevated immortality"

which sounds, yes, ridiculous.

The word "aspiration" is definitely a blatant mistranslation.

So if you want to keep the entry, i'd recommend

  Per aspera ad astra.  (Through hardship to immortality.)

because that is probably how it is commonly understood today, with
a weakened sense of "immortality" = "greatness, being remembered
for one's achievements after death", and it is also a compromise
not too far deviating from the actual meaning of the latin words,
even if sharpening "aspera" a bit and weakening "astra" somewhat
to smooth out content and and stylistic register.

I think while the literal translation "to the stars" might work in
a heroic legend, it is quite misleading out of context and ought
to be fixed.

Yours,
  Ingo

P.S.
I don't know why i looked at this so closely given my disdain for
these files - but from time to time, it appears i fail to sufficiently
tame my appetite for literature.


> i agree "aspiration" looks like a mistake. i suspect the intention
> was "asperity", which means harshness and rigour. there is a verb,
> asperate, but i think it's a bit obsolete.

"Asperity" certainly matches the *adjective* "asper", but it matches
the *noun* "aspera" much less than "hardship" or "adversity" or
simply "trouble".

> i'm a bit reluctant to just follow wikipedia blindly.  i think the
> latin is pural, and "hardships" doesn;t sound awesome when plural.

The grammatical form is unimportant for the translation in this
case.  "aspara" (pl.) = "hardship, adversity" (sing.) is OK, just
like you can translate "glasses" (pl.) to "Brille" (german, sing.).
The idea in the Latin word is that it is plural because more than
one unfortunate element is required to cause hardship, but that
is already adequately expressed in the singular form "hardship".

> also i prefer "by" to "through": since it's latin, a little
> archaicism is good.

Not really.  It is fake latin, not a real proverb, but a modern
invention.  So it should sound natural and modern.

> "adversity" would be easily understood and have the correct meaning
> (i think). but the translation you recommend (by Finn) restructures
> the phrase. i think it should begin "To the stars" (that's a minus
> for wikipedia too). so "To the stars by adversity."

You should really invert the order to improve the metric
and follow chronologic order as well as the usual wording.

> though i suspect "asperity" would have most of us reaching for our
> 

Re: games/fortune translation fix

2019-02-03 Thread Ingo Schwarze
Hi Pascal,

Pascal Stumpf wrote on Sun, Feb 03, 2019 at 11:13:12AM +0100:

> That is not correct because [...]

Which illustrates yet again that getting good grades at school
doesn't imply real understanding; i dared to try and research the
matter anyway because i forgot that with pascal@, we have somebody
in our group who really understands classical languages and literature.
Thanks for your insight, Pascal.

>> So if you want to keep the entry, i'd recommend
>> 
>>   Per aspera ad astra.  (Through hardship to immortality.)

> ok pascal@

Committed.
  Ingo



Re: update to PF pfctl(8) and pf.conf(5) manpages

2019-04-16 Thread Ingo Schwarze
Hi Alexandr,

Alexandr Nedvedicky wrote on Wed, Apr 17, 2019 at 12:09:10AM +0200:

> my oracle fellow pointed out [1] a PF documentation can be improved
> a bit, when it comes to newly introduced 'pfctl -FR' (a reset flush
> modifier).  I've decided to make manpage changes in separate diff as
> I expect some discussion on how much detailed the manpage should be.
> The diff here is my proposal.

First off, thank you for working on the pf.conf(5) manual page.
I appreciate that very much.  To me, the pf.conf(5) manual page
feels like a manual page of considerably above-average importance,
yet i have often felt it could probably be improved.  Improving it
isn't easy though: the manual is long and complex, and the subject
matter is more complicated than for most other manual pages.
So this is a place where expert attention from somebody who
actively works on the code is particularly helpful.

[...]
> From 'pfctl -FR' perspective the most important is to document whether
> particular option is parameter for PF driver (a.k.a. runtime option)
> or is just interpreted by parser as rules are being loaded.

> Documenting default values is just bonus, which I think might be
> useful for administrators.  On the other hand it adds more work to
> keep pf.conf(5) up-to-date when doing changes in code.
> I full understand the trade here between being precise and up-to-date.

I don't feel strongly about mentioning the defaults either way.
But i tend to think that if something is important enough to provide
users with a knob to tweak it, then they will probably need to know what
the default is before they can even start to figure out whether
they might need to tweak it.  So my guess would be that every time
people read the description of the option, they will also profit
from seeing the default right away.

If others feel the risk is too high that these defaults might get
outdated in the manual page, is there a central place in the code
you can point readers to for looking up the defaults?  If so,
pointing to that place might be a viable compromise.  If no such
place exists (which might imply the defaults are scattered / hard
to find in the code?) that would strengthen the argument to document
them.

The below nits are not objections, nor conditions, nor reproaches.
You are welcome to incorporate them; then again, such details can also
be fixed after commit when needed.  The main job, in particular in
a tricky page like this, is getting the content correct!


[...]
> diff --git a/share/man/man5/pf.conf.5 b/share/man/man5/pf.conf.5
> index 247ceef40a5..ab6b8c0447e 100644
> --- a/share/man/man5/pf.conf.5
> +++ b/share/man/man5/pf.conf.5
> @@ -1129,12 +1129,24 @@ can be used.
>  .Xr pf 4
>  may be tuned for various situations using the
>  .Ic set
> -command.
> +command. Two sorts of options should be distinguished.

New sentence, new line: this needs to be

   command.
  +Two sorts...

The same small problem occurs again below, several times.
I'm not annotating all the instances.  For spotting such trivial
mistakes,

  mandoc -T lint pf.conf.5

can be useful.

The word "should" is often somewhat confusing because it can leave
doubt whether something is a syntax requirement, or a stylistic
recommendation, or even a plan to change the code and the rules
in the future.

I'd prefer stating facts, for example:

   command.
  +There are two kinds of options:
  +.Em runtime

> +.Em runtime

A capital letter would be needed at the beginning of the sentence,
but instead, i'd prefer the colon as shown above.

> +options, which define parameters for 

The word "the" is missing here.

> +.Xr pf 4
> +driver and

  +driver, and

> +.Em parser
> +options, which fine-tune interpretation of rules, while
> +they are being loaded from file. The runtime options

s/from file/from the file/

... and again, new sentence, new line.

> +may be restored to their default values using:
> +.Pp
> +.Dl # pfctl -FReset
> +.Pp
> + 

Please show a blank between an option and its argument, unless
the syntax *requires* that there be no blank.

Blank lines are not allowed in mdoc(7), except inside .Bd -unfilled,
and .Pp is needless before .Bl without -compact, so:

  +may be restored to their default values using:
  +.Pp
  +.Dl # pfctl -F Reset
   .Bl -tag -width Ds

Alternatively, you could say, interrupting the text flow less:

  +may be restored to their default values using the
  +.Xr pfctl 8
  +.Fl F Cm Reset
  option.

>  .Bl -tag -width Ds
>  .It Ic set Cm block-policy drop | return
>  The
>  .Cm block-policy
> -option sets the default behaviour for the packet
> +parser option sets the default behaviour for the packet

That's indeed a nice, concise way to make the distinction.

>  .Ic block
>  action:
>  .Pp
> @@ -1146,8 +1158,13 @@ A TCP RST is returned for blocked TCP packets,
>  an ICMP UNREACHABLE is returned for blocked UDP packets,
>  and all other packets are silently dropped.
>  .El
> +.Pp
> +The default value is
> +.Cm drop .
>  .It Ic set Cm 

Re: 63 bit certificate ID is libressl affected?

2019-03-13 Thread Ingo Schwarze
Hi Tom,

Tom Smyth wrote on Wed, Mar 13, 2019 at 08:32:20PM +:

> Just saw the following article and i was wondering if libressl
> Might be affected by the bug also
> Top bit being set to 0 always making an effective 63 bits rather than 64
> bits

If i understand the article you quote correctly, is is about a minor
bug in some CA softwares (i.e. softwares used to operate certificate
authorities).  As far as i am aware, LibreSSL does *not* include any
software that can be used to operate a certificate authority.

The "openssl ca" subcommand of openssl(1) definitely does not count.
The openssl(1) utility is a *testing tool* and must not be used for
any kind of production purposes.

So i don't see how LibreSSL could possibly be affected.

If you still think it might be, please consider stating more precisely
which part of LibreSSL (i.e. which library function) you fear might
be broken in precisely which way.

> https://www.theregister.co.uk/2019/03/13/tls_cert_revoke_ejbca_config/

My impression is that the most important sentence from that article is
the following:

  While the serial number security issue is largely theoretical -
  63 bits leaves plenty of space to fend off collision attacks,
  even if it's not compliant with the spec [...]

That means this is unlikely to be a security issue in the first
place but looks more like a minor bug where some software is
gratutiosly violating a specification.  Sure, specifications should
not be set aside without a good reason, and certainly not accidentally,
and bugs ought to be fixed, but i fail to see any indication that
this bug might be more important than other run-of-the-mill bugs.

Please take this with a grain of salt: while i did occasionally
work on LibreSSL documentation in the past, my knowledge and
experience in matters of cryptography and PKI is very limited.
But i thought quick feedback might help to discourage people from
panicking.

Also, if you want to continue this discussion, i suggest moving
to misc@.  You didn't include a patch!  ;-)

Yours,
  Ingo



Re: caesar(6) to accept negative argument

2019-05-15 Thread Ingo Schwarze
Hi,

tleguern wrote on Wed, May 15, 2019 at 03:36:57PM +0100:

> This little patch makes caesar(6) useful at both encrypting and
> decrypting texts by allowing a negative rotation.

Committed, thanks.

> A similar patch was proposed by Dieter Rauschenberger in 2008 with
> little response

Well, it's a game, so people sometimes forget how critical it is
to publish errata for it as fastly as possible.  ;-)

Yours,
  Ingo


CVSROOT:/cvs
Module name:src
Changes by: schwa...@cvs.openbsd.org2019/05/15 09:59:24

Modified files:
games/caesar   : caesar.c 

Log message:
patch from  to support backward rotation,
hoping to save somebody's life from the Leather Goddesses of Phobos



Re: intro.8: .Xr rcctl 8 -- http://man.openbsd.org/intro.8

2019-05-25 Thread Ingo Schwarze
Hi Christian,

Christian Weisgerber wrote on Sun, May 26, 2019 at 01:19:01AM -:

> Not sure what do about "ELF(3)" in elf(3).


Index: elf.3
===
RCS file: /cvs/src/lib/libelf/elf.3,v
retrieving revision 1.2
diff -u -p -r1.2 elf.3
--- elf.3   19 Mar 2019 02:31:35 -  1.2
+++ elf.3   26 May 2019 01:46:30 -
@@ -615,8 +615,11 @@ descriptor itself.
 .Xr ar 5 ,
 .Xr elf 5
 .Sh HISTORY
-The original ELF(3) API was developed for Unix System V.
-The current implementation of the ELF(3) API appeared in
+The original
+.Nm
+API was developed for
+.At V .
+The current implementation of the API appeared in
 .Fx 7.0 .
 .Sh AUTHORS
 The ELF library was written by
Index: elf_flagdata.3
===
RCS file: /cvs/src/lib/libelf/elf_flagdata.3,v
retrieving revision 1.1
diff -u -p -r1.1 elf_flagdata.3
--- elf_flagdata.3  1 Feb 2019 05:27:37 -   1.1
+++ elf_flagdata.3  26 May 2019 01:46:30 -
@@ -34,7 +34,7 @@
 .Nm elf_flagphdr ,
 .Nm elf_flagscn ,
 .Nm elf_flagshdr
-.Nd manipulate flags associated with ELF(3) data structures
+.Nd manipulate flags associated with ELF data structures
 .Sh LIBRARY
 .Lb libelf
 .Sh SYNOPSIS
@@ -65,7 +65,9 @@ and
 .Ar scn
 denote the data structures whose flags need to be changed.
 These values should have been returned by prior calls to
-functions in the ELF(3) API set:
+functions in the
+.Xr elf 3
+API set:
 .Bl -bullet -compact
 .It
 Argument
@@ -175,7 +177,9 @@ function and the
 .Dv ELF_F_ARCHIVE
 and
 .Dv ELF_F_ARCHIVE_SYSV
-flags are an extension to the ELF(3) API.
+flags are an extension to the
+.Xr elf 3
+API.
 .Sh ERRORS
 These functions may fail with the following errors:
 .Bl -tag -width "[ELF_E_RESOURCE]"
Index: elf_open.3
===
RCS file: /cvs/src/lib/libelf/elf_open.3,v
retrieving revision 1.1
diff -u -p -r1.1 elf_open.3
--- elf_open.3  1 Feb 2019 05:27:37 -   1.1
+++ elf_open.3  26 May 2019 01:46:30 -
@@ -43,7 +43,9 @@ The functions
 .Fn elf_open
 and
 .Fn elf_openmemory
-are extensions to the ELF(3) API, for the internal use of the
+are extensions to the
+.Xr elf 3
+API, for the internal use of the
 Elftoolchain project.
 Portable applications should not use these functions.
 .Pp
@@ -71,7 +73,9 @@ specifies the size of the memory area in
 The function returns a pointer to a ELF descriptor if successful, or
 NULL if an error occurred.
 .Sh COMPATIBILITY
-These functions are non-standard extensions to the ELF(3) API set.
+These functions are non-standard extensions to the
+.Xr elf 3
+API set.
 .Pp
 The behavior of these functions differs from their counterparts
 .Xr elf_begin 3
Index: gelf.3
===
RCS file: /cvs/src/lib/libelf/gelf.3,v
retrieving revision 1.1
diff -u -p -r1.1 gelf.3
--- gelf.3  1 Feb 2019 05:27:37 -   1.1
+++ gelf.3  26 May 2019 01:46:31 -
@@ -192,7 +192,10 @@ Copy back an ELF symbol table entry.
 .Xr elf 3 ,
 .Xr elf 5
 .Sh HISTORY
-The GELF(3) API first appeared in System V Release 4.
+The
+.Nm
+API first appeared in
+.At V.4 .
 This implementation of the API first appeared in
 .Fx 7.0 .
 .Sh AUTHORS
Index: gelf_newehdr.3
===
RCS file: /cvs/src/lib/libelf/gelf_newehdr.3,v
retrieving revision 1.1
diff -u -p -r1.1 gelf_newehdr.3
--- gelf_newehdr.3  1 Feb 2019 05:27:38 -   1.1
+++ gelf_newehdr.3  26 May 2019 01:46:31 -
@@ -134,7 +134,9 @@ The
 function uses a type of
 .Ft "void *"
 for its returned value.
-This differs from some other implementations of the ELF(3) API, which use an
+This differs from some other implementations of the
+.Xr elf 3
+API, which use an
 .Ft "unsigned long"
 return type.
 .Sh ERRORS
Index: gelf_newphdr.3
===
RCS file: /cvs/src/lib/libelf/gelf_newphdr.3,v
retrieving revision 1.1
diff -u -p -r1.1 gelf_newphdr.3
--- gelf_newphdr.3  1 Feb 2019 05:27:38 -   1.1
+++ gelf_newphdr.3  26 May 2019 01:46:31 -
@@ -103,7 +103,9 @@ The
 function uses a type of
 .Ft "void *"
 for its returned value.
-This differs from some other implementations of the ELF(3) API, which use an
+This differs from some other implementations of the
+.Xr elf 3
+API, which use an
 .Ft "unsigned long"
 return type.
 .Sh ERRORS



Re: proot.1: Formatting fix

2019-06-08 Thread Ingo Schwarze
Hi Fabio,

Fabio Scotoni wrote on Sat, Jun 08, 2019 at 12:15:57PM +0200:

> This patch fixes a newline in proot.1 preventing LOCKDIR from
> being rendered correctly.

Thanks for reporting.  I committed the patch shown below.

> Note that this causes the input line to be very long, which
> seemed okay to me

Well, technically, mandoc and groff accept lines of arbitrary length,
and we now longer care about very old formatters that may have had
length limitations, but it still isn't good style.

> because dhcp-options.5 does the same (e.g. on line 184).

In that file, using .Xo or backslash line continuation would be
better, but it hardly warrants a commit.

Yours,
  Ingo


CVSROOT:/cvs
Module name:src
Changes by: schwa...@cvs.openbsd.org2019/06/08 08:48:53

Modified files:
share/man/man1 : proot.1 

Log message:
Missing macro, reported by Fabio Scotoni .
While here, also insert missing whitespace before punctuation.


Index: proot.1
===
RCS file: /cvs/src/share/man/man1/proot.1,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -r1.4 -r1.5
--- proot.1 29 May 2019 19:37:06 -  1.4
+++ proot.1 8 Jun 2019 14:48:53 -   1.5
@@ -225,8 +225,8 @@
 .It
 All the ports specific sub directories if they are defined,
 namely
-.Cm DISTDIR , WRKOBJDIR, LOGDIR, PACKAGE_REPOSITORY , PLIST_REPOSITORY ,
-LOCKDIR.
+.Cm DISTDIR , WRKOBJDIR , LOGDIR , PACKAGE_REPOSITORY , PLIST_REPOSITORY ,
+.Cm LOCKDIR .
 .It
 The
 .Cm PORTSDIR



Re: [diff] events.html - video links for BSDCan 2019

2019-06-18 Thread Ingo Schwarze
Hi Ross,

Ross L Richardson wrote on Tue, Jun 18, 2019 at 08:34:41PM +1000:

> The patch below just adds links for the videos currently available.

Committed, thanks.

> Notes:
> - Lines were already > 80 characters, so I haven't folded.

I took the liberty to fold the lines touched.  I think that the habit
of disregarding the 80 character rule of thumb in HTML code is not a
good habit - and extending beyond column 240 even less so.  Reading
diffs of HTML code is already harder than reading diffs of C code
in the first place, no need to make it even worse.

> - Links are to the context of the BSDCan 2019 playlist (which may not
>   be what is wanted).

Right, i deleted the list, index, and time attributes.  Youtube channels
tend to edit their playlists, and then such links get outdated.  So i
doubt mentioning the playlist warrants any additional maintenance effort.
Besides, YT typically suggests related content anyway, sometimes even
related playlists.

Yours,
  Ingo


> Index: events.html
> ===
> RCS file: /cvs/www/events.html,v
> retrieving revision 1.1173
> diff -u -p -r1.1173 events.html
> --- events.html   1 Jun 2019 23:12:47 -   1.1173
> +++ events.html   18 Jun 2019 10:25:47 -
> @@ -69,13 +69,13 @@ May 15-18, 2019, Ottawa, Canada.
>  Bob Beck -
>  https://github.com/bob-beck/libtls/blob/master/TUTORIAL.md;>libtls 
> for beginners conference tutorial
>  Theo Buehler - Design and verification of the TLS 1.3 handshake state 
> machine in LibreSSL (slides)
> -Florian Obser - https://man.openbsd.org/unwind.8;>unwind(8) 
> a privilege-separated, validating DNS recursive nameserver for every laptop 
> (slides)
> -Stefan Sperling - Building an accessible OpenBSD laptop (Enabling secure 
> and functional computing for a person with severe disabilities) ( href="papers/bsdcan2019-accessible-openbsd-laptop.pdf">slides)
> +Florian Obser - https://man.openbsd.org/unwind.8;>unwind(8) 
> a privilege-separated, validating DNS recursive nameserver for every laptop 
> (slides,  href="https://www.youtube.com/watch?v=88SoI49nO4olist=PLeF8ZihVdpFegPoAKppaDSoYmsBvpnSZvindex=8t=0s;>video)
> +Stefan Sperling - Building an accessible OpenBSD laptop (Enabling secure 
> and functional computing for a person with severe disabilities) ( href="papers/bsdcan2019-accessible-openbsd-laptop.pdf">slides,  href="https://www.youtube.com/watch?v=Ma_Y1hVmK8olist=PLeF8ZihVdpFegPoAKppaDSoYmsBvpnSZvindex=6t=0s;>video)
>  Antoine Jacoutot -  href="https://man.openbsd.org/syspatch.8;>syspatch(8), The Boring Healing 
> Potion (slides)
> -Alexander Bluhm - Measuring Performance on OpenBSD ( href="papers/bsdcan2019-perform-slides.pdf">slides)
> +Alexander Bluhm - Measuring Performance on OpenBSD ( href="papers/bsdcan2019-perform-slides.pdf">slides,  href="https://www.youtube.com/watch?v=s6rAXaHylFMlist=PLeF8ZihVdpFegPoAKppaDSoYmsBvpnSZvindex=15t=0s;>video)
>  Bob Beck
>  Unveil in OpenBSD
> -Jan Klemkow - Network booted OpenBSD Workstations ( href="papers/bsdcan2019_netboot.pdf">slides)
> +Jan Klemkow - Network booted OpenBSD Workstations ( href="papers/bsdcan2019_netboot.pdf">slides,  href="https://www.youtube.com/watch?v=kFqHXfWEB4olist=PLeF8ZihVdpFegPoAKppaDSoYmsBvpnSZvindex=19t=0s;>video)
>  
>  
>  



Re: sftp(1) manual diff

2019-06-18 Thread Ingo Schwarze
Hi,

Jason McIntyre wrote on Mon, Jun 17, 2019 at 04:58:26PM +0100:
> On Mon, Jun 17, 2019 at 05:31:16PM +0200, Tim van der Molen wrote:
>> Jason McIntyre (2019-06-17 15:02 +0200):
>>> On Mon, Jun 17, 2019 at 02:47:09PM +0200, Tim van der Molen wrote:

 sftp(1) has this:
reput [-Ppr] [local-path] remote-path
   Resume upload of [local-path].  Equivalent to put with the -a
   flag set.
 remote-path should be marked optional, not local-path. Probably a pasto
 from reget.

>>> ok by me, but i have another question - shouldn;t the usage for reput
>>> and reget show the -f flag too? i know it's unrelated to your diff, but
>>> two birds and all that.

>> Yes, you're right.
>> I see there's also a -R flag (equivalent to -r), so I added that one as
>> well (it's also listed in sftp's help command).

> ok, please wait for a bit of feedback. i suspect -R is deliberately
> undocumented (maybe to mirror cp).

I seriously doubt that.  See here:

   $ sftp localhost
  schwarze@localhost's password: 
  Connected to localhost.
  sftp> help
  Available commands:
  [...]
  get [-afPpRr] remote [local]   Download file
  reget [-fPpRr] remote [local]  Resume download file
  reput [-fPpRr] [local] remote  Resume upload file

Both the -Pp and the -Rr pairs are consistently documented in the
help() string in sftp.c, so the omission of -R in the manual page
looks like an oversight.  It makes little sense to me to consistently
document -P but to be inconsistent about -R.


>> And then I noticed the help command also has the original issue. While
>> there I might as well restore alphabetical order.
>> (It's not two birds, but a can of worms!)

> it's always a can of worms.

Right, taming the worms is hard work.
In reality, they won't just grab their mothers and fly away.


Anyway, OK for the diff below.

Feel free to reconsider the alphabetical order of "reget reput rename"
if you want to.  ;-)

Yours,
  Ingo


>> Index: sftp.1
>> ===
>> RCS file: /cvs/src/usr.bin/ssh/sftp.1,v
>> retrieving revision 1.125
>> diff -p -u -r1.125 sftp.1
>> --- sftp.1   22 Jan 2019 06:58:31 -  1.125
>> +++ sftp.1   17 Jun 2019 15:27:11 -
>> @@ -404,7 +404,7 @@ extension.
>>  Quit
>>  .Nm sftp .
>>  .It Xo Ic get
>> -.Op Fl afPpr
>> +.Op Fl afPpRr
>>  .Ar remote-path
>>  .Op Ar local-path
>>  .Xc
>> @@ -446,9 +446,11 @@ or
>>  flag is specified, then full file permissions and access times are
>>  copied too.
>>  .Pp
>> -If the
>> +If either the
>> +.Fl R
>> +or
>>  .Fl r
>> -flag is specified then directories will be copied recursively.
>> +flag is specified, then directories will be copied recursively.
>>  Note that
>>  .Nm
>>  does not follow symbolic links when performing recursive transfers.
>> @@ -545,7 +547,7 @@ Create remote directory specified by
>>  .It Ic progress
>>  Toggle display of progress meter.
>>  .It Xo Ic put
>> -.Op Fl afPpr
>> +.Op Fl afPpRr
>>  .Ar local-path
>>  .Op Ar remote-path
>>  .Xc
>> @@ -588,9 +590,11 @@ or
>>  flag is specified, then full file permissions and access times are
>>  copied too.
>>  .Pp
>> -If the
>> +If either the
>> +.Fl R
>> +or
>>  .Fl r
>> -flag is specified then directories will be copied recursively.
>> +flag is specified, then directories will be copied recursively.
>>  Note that
>>  .Nm
>>  does not follow symbolic links when performing recursive transfers.
>> @@ -600,7 +604,7 @@ Display remote working directory.
>>  Quit
>>  .Nm sftp .
>>  .It Xo Ic reget
>> -.Op Fl Ppr
>> +.Op Fl fPpRr
>>  .Ar remote-path
>>  .Op Ar local-path
>>  .Xc
>> @@ -612,12 +616,12 @@ with the
>>  .Fl a
>>  flag set.
>>  .It Xo Ic reput
>> -.Op Fl Ppr
>> -.Op Ar local-path
>> -.Ar remote-path
>> +.Op Fl fPpRr
>> +.Ar local-path
>> +.Op Ar remote-path
>>  .Xc
>>  Resume upload of
>> -.Op Ar local-path .
>> +.Ar local-path .
>>  Equivalent to
>>  .Ic put
>>  with the
>> Index: sftp.c
>> ===
>> RCS file: /cvs/src/usr.bin/ssh/sftp.c,v
>> retrieving revision 1.192
>> diff -p -u -r1.192 sftp.c
>> --- sftp.c   7 Jun 2019 03:47:12 -   1.192
>> +++ sftp.c   17 Jun 2019 15:27:11 -
>> @@ -263,8 +263,6 @@ help(void)
>>  "   filesystem containing 'path'\n"
>>  "exit   Quit sftp\n"
>>  "get [-afPpRr] remote [local]   Download file\n"
>> -"reget [-fPpRr] remote [local]  Resume download file\n"
>> -"reput [-fPpRr] [local] remote  Resume upload file\n"
>>  "help   Display this help text\n"
>>  "lcd path   Change local directory to 
>> 'path'\n"
>>  "lls [ls-options [path]]Display local directory 
>> listing\n"
>> @@ -278,6 +276,8 @@ help(void)
>>  "put [-afPpRr] local [remote]   Upload file\n"
>>  "pwd  

Re: [patch] improve strptime(3) %z timezone parsing

2019-05-10 Thread Ingo Schwarze
Hi,

> That one looks correct.  OK millert@

Committed, thanks for checking!


While here, i noticed ugly preprocessor macros.
Let's make our future life easier by unifdefing a bit.
When compiling with -g0, there is no object change.

Note that if TM_ZONE is not defined, wcsftime.c doesn't
currently even compile.  Talk about code in the tree that
is never tested.

OK?
  Ingo


Index: localtime.c
===
RCS file: /cvs/src/lib/libc/time/localtime.c,v
retrieving revision 1.59
diff -u -p -r1.59 localtime.c
--- localtime.c 19 Sep 2016 12:48:21 -  1.59
+++ localtime.c 10 May 2019 22:17:14 -
@@ -46,7 +46,7 @@
 ** in which Daylight Saving Time is never observed.
 ** 4.  They might reference tzname[0] after setting to a time zone
 ** in which Standard Time is never observed.
-** 5.  They might reference tm.TM_ZONE after calling offtime.
+** 5.  They might reference tm.tm_zone after calling offtime.
 ** What's best to do in the above cases is open to debate;
 ** for now, we just set things up so that in any of the five cases
 ** WILDABBR is used. Another possibility: initialize tzname[0] to the
@@ -214,14 +214,8 @@ DEF_WEAK(tzname);
 
 static struct tm   tm;
 
-#ifdef USG_COMPAT
 long   timezone = 0;
 intdaylight = 0;
-#endif /* defined USG_COMPAT */
-
-#ifdef ALTZONE
-time_t altzone = 0;
-#endif /* defined ALTZONE */
 
 static long
 detzcode(const char *codep)
@@ -255,13 +249,8 @@ settzname(void)
 
tzname[0] = wildabbr;
tzname[1] = wildabbr;
-#ifdef USG_COMPAT
daylight = 0;
timezone = 0;
-#endif /* defined USG_COMPAT */
-#ifdef ALTZONE
-   altzone = 0;
-#endif /* defined ALTZONE */
if (sp == NULL) {
tzname[0] = tzname[1] = (char *)gmt;
return;
@@ -273,16 +262,10 @@ settzname(void)
const struct ttinfo *ttisp = >ttis[sp->types[i]];
 
tzname[ttisp->tt_isdst] = >chars[ttisp->tt_abbrind];
-#ifdef USG_COMPAT
if (ttisp->tt_isdst)
daylight = 1;
if (!ttisp->tt_isdst)
timezone = -(ttisp->tt_gmtoff);
-#endif /* defined USG_COMPAT */
-#ifdef ALTZONE
-   if (ttisp->tt_isdst)
-   altzone = -(ttisp->tt_gmtoff);
-#endif /* defined ALTZONE */
}
/*
** Finally, scrub the abbreviations.
@@ -1274,9 +1257,7 @@ localsub(const time_t *timep, long offse
result = timesub(, ttisp->tt_gmtoff, sp, tmp);
tmp->tm_isdst = ttisp->tt_isdst;
tzname[tmp->tm_isdst] = >chars[ttisp->tt_abbrind];
-#ifdef TM_ZONE
-   tmp->TM_ZONE = >chars[ttisp->tt_abbrind];
-#endif /* defined TM_ZONE */
+   tmp->tm_zone = >chars[ttisp->tt_abbrind];
return result;
 }
 
@@ -1325,21 +1306,19 @@ gmtsub(const time_t *timep, long offset,
}
_THREAD_PRIVATE_MUTEX_UNLOCK(gmt);
result = timesub(timep, offset, gmtptr, tmp);
-#ifdef TM_ZONE
/*
** Could get fancy here and deliver something such as
** "UTC+" or "UTC-" if offset is non-zero,
** but this is no time for a treasure hunt.
*/
if (offset != 0)
-   tmp->TM_ZONE = wildabbr;
+   tmp->tm_zone = wildabbr;
else {
if (gmtptr == NULL)
-   tmp->TM_ZONE = (char *)gmt;
+   tmp->tm_zone = (char *)gmt;
else
-   tmp->TM_ZONE = gmtptr->chars;
+   tmp->tm_zone = gmtptr->chars;
}
-#endif /* defined TM_ZONE */
return result;
 }
 
@@ -1508,9 +1487,7 @@ timesub(const time_t *timep, long offset
idays -= ip[tmp->tm_mon];
tmp->tm_mday = (int) (idays + 1);
tmp->tm_isdst = 0;
-#ifdef TM_GMTOFF
-   tmp->TM_GMTOFF = offset;
-#endif /* defined TM_GMTOFF */
+   tmp->tm_gmtoff = offset;
return tmp;
 }
 
Index: private.h
===
RCS file: /cvs/src/lib/libc/time/private.h,v
retrieving revision 1.38
diff -u -p -r1.38 private.h
--- private.h   24 Oct 2015 18:13:18 -  1.38
+++ private.h   10 May 2019 22:17:14 -
@@ -9,12 +9,9 @@
 */
 
 /* OpenBSD defaults */
-#define TM_GMTOFF  tm_gmtoff
-#define TM_ZONEtm_zone
 #define PCTS   1
 #define ALL_STATE  1
 #define STD_INSPIRED   1
-#define USG_COMPAT 1
 
 /*
 ** This header is for use ONLY with the time conversion code.
Index: strftime.c
===
RCS file: /cvs/src/lib/libc/time/strftime.c,v
retrieving revision 1.30
diff -u -p -r1.30 strftime.c
--- strftime.c  21 Sep 2016 04:38:57 -  1.30
+++ strftime.c  10 May 2019 22:17:14 -
@@ -456,12 +456,9 @@ label:
   

Re: [patch] remove reference to HOSTALIASES in hostname(7)

2019-05-09 Thread Ingo Schwarze
Hi,

Theo de Raadt wrote on Mon, Apr 15, 2019 at 12:56:35PM -0600:
> Hiltjo Posthuma  wrote:

>> I noticed the man page hostname(7) still references the environment
>> variable HOSTALIASES.  This functionality seems to be removed in the
>> commit:
>> https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/asr/asr.c?rev=1.50=text/x-cvsweb-markup

> Your deletion is highly excessive.

I believe the following is the right amount of text to remove.
The description of the file format can go because it does not
appear to be used for anything else.  In particular, hosts(5)
contains its own file format description, which is different.

OK?
  Ingo


Index: hostname.7
===
RCS file: /cvs/src/share/man/man7/hostname.7,v
retrieving revision 1.10
diff -u -p -r1.10 hostname.7
--- hostname.7  29 May 2017 12:13:50 -  1.10
+++ hostname.7  9 May 2019 17:17:47 -
@@ -52,19 +52,6 @@ which must generally translate the name 
 Hostnames are resolved by the Internet name resolver in the following
 fashion.
 .Pp
-If the name consists of a single component, i.e., contains no dot,
-and if the environment variable
-.Ev HOSTALIASES
-is set to the name of a file,
-that file is searched for any string matching the input hostname.
-The file should consist of lines made up of two whitespace separated strings,
-the first of which is the hostname alias,
-and the second of which is the complete hostname
-to be substituted for that alias.
-If a case-insensitive match is found between the hostname to be resolved
-and the first field of a line in the file, the substituted name is looked
-up with no further processing.
-.Pp
 If the input name ends with a trailing dot,
 the trailing dot is removed,
 and the remaining name is looked up with no further processing.



Re: [patch] improve strptime(3) %z timezone parsing

2019-05-09 Thread Ingo Schwarze
Hi,

Theo de Raadt wrote on Sun, Mar 24, 2019 at 12:48:03PM -0600:
> Hiltjo Posthuma  wrote:
>> On Sun, Feb 24, 2019 at 01:11:39PM +0100, Hiltjo Posthuma wrote:

>>> 2. The military/nautical UTC offsets are [...]

broken by design.

>> As discussed previously for point 2 I think it is fine to either
>> return NULL and not handle military zones like most libc's do

> My position is we should delete support.

So here is a diff to delete support for the military time zones.
Just error out when encountering them.  That's the best thing we
can do because their meaning is ill-defined in the first place.

I'm not mixing anything else into this diff.  The other bugs should
be handled separately.

OK?
  Ingo


Index: strptime.3
===
RCS file: /cvs/src/lib/libc/time/strptime.3,v
retrieving revision 1.27
diff -u -p -r1.27 strptime.3
--- strptime.3  21 Feb 2019 19:10:32 -  1.27
+++ strptime.3  9 May 2019 18:29:19 -
@@ -242,15 +242,7 @@ time
 or
 .Ql Standard
 .Pq Dq S
-time;
-a single letter military timezone specified as:
-.Dq A
-through
-.Dq I
-and
-.Dq K
-through
-.Dq Y .
+time.
 .It Cm \&%Z
 timezone name or no characters when timezone information is unavailable.
 .It Cm \&%%
Index: strptime.c
===
RCS file: /cvs/src/lib/libc/time/strptime.c,v
retrieving revision 1.25
diff -u -p -r1.25 strptime.c
--- strptime.c  21 Feb 2019 19:10:32 -  1.25
+++ strptime.c  9 May 2019 18:29:19 -
@@ -520,25 +520,6 @@ literal:
bp = ep;
continue;
}
-
-   if ((*bp >= 'A' && *bp <= 'I') ||
-   (*bp >= 'L' && *bp <= 'Y')) {
-#ifdef TM_GMTOFF
-   /* Argh! No 'J'! */
-   if (*bp >= 'A' && *bp <= 'I')
-   tm->TM_GMTOFF =
-   ('A' - 1) - (int)*bp;
-   else if (*bp >= 'L' && *bp <= 'M')
-   tm->TM_GMTOFF = 'A' - (int)*bp;
-   else if (*bp >= 'N' && *bp <= 'Y')
-   tm->TM_GMTOFF = (int)*bp - 'M';
-#endif
-#ifdef TM_ZONE
-   tm->TM_ZONE = NULL; /* XXX */
-#endif
-   bp++;
-   continue;
-   }
return NULL;
}
offs = 0;



Re: [patch] improve strptime(3) %z timezone parsing

2019-05-10 Thread Ingo Schwarze
Hi Todd,

Todd C. Miller wrote on Fri, May 10, 2019 at 02:08:45PM -0600:
> On Fri, 10 May 2019 16:52:35 +0200, Ingo Schwarze wrote:

>> Here is a patch to fix the code.

> OK millert@ for that part.

Thanks, committed.

>> The change to %Z is exactly what Hiltjo sent.
>> The current code for %z is unnecessarily complicated.
>> Rather than fixing it, i simply rewrote it from scratch.
>> I like it when a bugfix results in -28 +11 LOC and better readability.

> I don't believe this handles the "[+-]hh" form.

Ouch.  No, it does not.  Thanks for spotting the regression.

The following patch preserves the parsing behaviour
and correctly stores the number of seconds into tm_gmtoff.

OK?
  Ingo


 $ ./z +1
NULL
 $ ./z +02
7200
 $ ./z +02x
7200 (rest "x")
 $ ./z +02: 
7200
 $ ./z +02:x
7200 (rest "x")
 $ ./z +02:1 
NULL
 $ ./z +02:01
7260
 $ ./z +02:013
7260 (rest "3")


Index: strptime.c
===
RCS file: /cvs/src/lib/libc/time/strptime.c,v
retrieving revision 1.27
diff -u -p -r1.27 strptime.c
--- strptime.c  10 May 2019 20:24:58 -  1.27
+++ strptime.c  10 May 2019 20:43:35 -
@@ -519,32 +519,17 @@ literal:
}
return NULL;
}
-   offs = 0;
-   for (i = 0; i < 4; ) {
-   if (isdigit(*bp)) {
-   offs = offs * 10 + (*bp++ - '0');
-   i++;
-   continue;
-   }
-   if (i == 2 && *bp == ':') {
-   bp++;
-   continue;
-   }
-   break;
-   }
-   switch (i) {
-   case 2:
-   offs *= 100;
-   break;
-   case 4:
-   i = offs % 100;
-   if (i >= 60)
-   return NULL;
-   /* Convert minutes into decimal */
-   offs = (offs / 100) * 100 + (i * 50) / 30;
-   break;
-   default:
+   if (!isdigit(bp[0]) || !isdigit(bp[1]))
return NULL;
+   offs = ((bp[0]-'0') * 10 + (bp[1]-'0')) * SECSPERHOUR;
+   bp += 2;
+   if (*bp == ':')
+   bp++;
+   if (isdigit(*bp)) {
+   offs += (*bp++ - '0') * 10 * SECSPERMIN;
+   if (!isdigit(*bp))
+   return NULL;
+   offs += (*bp++ - '0') * SECSPERMIN;
}
if (neg)
offs = -offs;



less(1) UTF-8 cleanup: cmd_putstr()

2019-05-09 Thread Ingo Schwarze
Hi,

let's start tackling the last file which is using stuff from charset.c:
the file cmdbuf.c.  The first dirty function is cmd_putstr().
Fortunately, it doesn't actually need to handle non-ASCII characters.

The function cmd_putstr() is called only from command.c.
Almost all callers pass literal ASCII strings.
The only exception is start_mca(), which passes on its second argument.
Again, almost all callers of start_mca() pass literal ASCII strings.
The only exception is mca_opt_char(), which passes the return value
of opt_prompt(), defined in option.c.

The argument of opt_prompt() is the static variable curropt,
which is only set to non-NULL from findopt_name() or findopt() in opttbl.c.
Both of these functions only ever return literal ASCII strings contained
in the static struct loption option[].

To summarize, cmd_putstr() is only ever called with ASCII strings,
so it can be radically simplified, deleting two LWCHAR variables,
one call to step_char(), is_composing_char(), is_combining_char(),
and is_wide_char() each.

OK?
  Ingo


Index: cmdbuf.c
===
RCS file: /cvs/src/usr.bin/less/cmdbuf.c,v
retrieving revision 1.16
diff -u -p -r1.16 cmdbuf.c
--- cmdbuf.c17 Sep 2016 15:06:41 -  1.16
+++ cmdbuf.c9 May 2019 12:22:51 -
@@ -117,29 +117,16 @@ clear_cmd(void)
 }
 
 /*
- * Display a string, usually as a prompt for input into the command buffer.
+ * Display an ASCII string, usually as a prompt for input,
+ * into the command buffer.
  */
 void
 cmd_putstr(char *s)
 {
-   LWCHAR prev_ch = 0;
-   LWCHAR ch;
-   char *endline = s + strlen(s);
while (*s != '\0') {
-   char *ns = s;
-   ch = step_char(, +1, endline);
-   while (s < ns)
-   putchr(*s++);
-   if (!utf_mode) {
-   cmd_col++;
-   prompt_col++;
-   } else if (!is_composing_char(ch) &&
-   !is_combining_char(prev_ch, ch)) {
-   int width = is_wide_char(ch) ? 2 : 1;
-   cmd_col += width;
-   prompt_col += width;
-   }
-   prev_ch = ch;
+   putchr(*s++);
+   cmd_col++;
+   prompt_col++;
}
 }
 



Re: grep ".one\|.two" doesn't work on OpenBSD. Is it expected?

2019-05-21 Thread Ingo Schwarze
Hi,

Todd C. Miller wrote on Mon, May 20, 2019 at 01:22:21PM -0600:
> On Mon, 20 May 2019 20:01:12 +0200, Juan Francisco Cantero Hurtado wrote:

>> The grep command works with GNU, NetBSD, FreeBSD and BusyBox. It fails
>> on OpenBSD and Solaris 11. I'm suggesting upstream to change the command
>> to "grep -e ".datapack" -e ".histpack"" but I would like to know if this
>> is a bug or we just don't support the | in the grep patterns.

> That's GNU regexp format, not POSIX.  The standard way to do this
> is with an extended regular expression using egrep.  E.g.
> 
> cat test.txt | egrep ".datapack|.histpack"
> 
> though you should escape the '.' if you want it to match literally.

I just checked whether the meaning of "\|" is properly documented,
and it is:

  re_format(7):
 POSIX leaves some aspects of RE syntax and semantics open;
 '**' marks decisions on these aspects that may not be fully
 portable to other POSIX implementations.
 [...]
 An atom is [...]
 a '\' followed by one of the characters '^.[$()|*+?{\' (matching
 that character taken as an ordinary character, as if the '\'
 had not been present**), [...]

So, there is nothing to fix in the manual, and besides, our choice
does not violate POSIX:

  
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03

  9.3.2 BRE Ordinary Characters

  An ordinary character is a BRE that matches itself: any character
  in the supported character set, except for the BRE special
  characters listed in BRE Special Characters.

  The interpretation of an ordinary character preceded by an unescaped
   ( '\\' ) is undefined, except for:  [...]

Since '|' is an ordinary character, "\|" causes undefined behaviour.

Yours,
  Ingo



Re: cp(1) add -l hard link -s symlink options

2019-05-23 Thread Ingo Schwarze
Hi Tracey,

Tracey Emery wrote on Thu, May 23, 2019 at 02:35:10PM -0600:

> Attached is a proposed diff for cp(1). It adds the -l (hard link)
> and -s (symlink) options.

I don't like that.  That's exactly what can be done with ln(1) in a
standard way.  There is no value in making every tool do everything -
quite to that contrary, that only causes gratuitous complication and
confusion.

> These options are available in GNU cp, FreeBSD cp,
> and the -l option is at least in NetBSD and Dragonfly.
> 
> I needed the -l option to use the system cp for rsnapshots, instead of
> their native_cl_al function. Hopefully, this will speed up my backups.

If the rsnapshots uses such options, i think you should send patches
to the rsnapshots project instead.  Tell them to use POSIX features
instead of relying on pointless GNUisms.

Yours,
  Ingo



Re: update to PF pfctl(8) and pf.conf(5) manpages

2019-05-08 Thread Ingo Schwarze
Hi Alexandr,

Alexandr Nedvedicky wrote on Wed, May 08, 2019 at 04:55:57PM +0200:

> below is third iteration of pf.conf.5 manpage.

OK schwarze@

A few final nits inline.

Thank you again for doing this work.
  Ingo


> diff --git a/share/man/man5/pf.conf.5 b/share/man/man5/pf.conf.5
[...]
> @@ -1237,6 +1244,22 @@ Various limits can be combined on a single line:
>  .Bd -literal -offset indent
>  set limit { states 2, frags 2000, src-nodes 2000 }
>  .Ed
> +.Pp
> +.Xr pf 4
> +has the following defaults:
> +.Bl -column table-entries PFR_KENTRY_HIWAT_SMALL platform_dependent
> +.It states Ta Dv PFSTATE_HIWAT Ta Pq 10
> +.It tables Ta Dv PFR_KTABLE_HIWAT Ta Pq 1000
> +.It table-entries Ta Dv PFR_KENTRY_HIWAT Ta Pq 20
> +.It table-entries Ta Dv PFR_KENTRY_HIWAT_SMALL Ta Pq 10
> +.It frags Ta Dv NMBCLUSTERS/32 Ta Pq platform dependent

  Ta Dv NMBCLUSTERS Ns /32 Ta

would be even better.

> +.El
> +.Pp
> +.Cm NMBCLUSTERS

Please make this line

  .Dv NMBCLUSTERS

> +defines the total number of packets which can exist in-system at any one 
> time.
> +Refer to
> +.In machine/param.h
> +to find out a value for system sepcific values.

s/sepcific/specific/

The following would sound better; is it accurate?

  Refer to
  .In machine/param.h
  for the platform-specific value.

> @@ -1393,12 +1425,13 @@ set syncookies adaptive (start 25%, end 12%)
>  .It Ic set Cm timeout Ar variable value
>  .Bl -tag -width "src.track" -compact
>  .It Cm frag
> -Seconds before an unassembled fragment is expired.
> +Seconds before an unassembled fragment is expired (60 by default).
>  .It Cm interval
> -Interval between purging expired states and fragments.
> +Interval between purging expired states and fragments (10 seconds by 
> default).
>  .It Cm src.track
>  Length of time to retain a source tracking entry after the last state
> -expires.
> +expires (0 by default, which means there is no global limit.
> +The value is defined by rule which creates state.).

I think articles are needed:

  The value is defined by the rule which creates the state).

> @@ -1493,6 +1526,10 @@ set limit states 10
>  .Pp
>  With 9000 state table entries, the timeout values are scaled to 50%
>  (tcp.first 60, tcp.established 43200).
> +.Pp
> +.Dq pfctl -F Reset
> +restores default values for those options: debug, all limit options,
> +loginterface, reassemble, skip, syncookies, all timeouts.
>  .El
>  .Sh QUEUEING
>  Packets can be assigned to queues for the purpose of bandwidth

I think this would sound better:

  restores default values for the following options: debug, all limit options,
  loginterface, reassemble, skip, syncookies, and all timeouts.

Also, the new lines should go after the .El, not before it, like this:

.Pp
With 9000 state table entries, the timeout values are scaled to 50%
(tcp.first 60, tcp.established 43200).
.El
.Pp
.Dq pfctl -F Reset
restores default values for the following options: debug, all limit options,
loginterface, reassemble, skip, syncookies, and all timeouts.
.Sh QUEUEING


Either way, this seems mature enough for getting it in.
IMHO, no need to resend it again.

Yours,
  Ingo



Re: [patch] improve strptime(3) %z timezone parsing

2019-05-10 Thread Ingo Schwarze
Hi Ted,

Ted Unangst wrote on Thu, May 09, 2019 at 04:16:40PM -0400:
> Ingo Schwarze wrote:

>> I'm not mixing anything else into this diff.  The other bugs should
>> be handled separately.
 
> Works for me. (with additional comment removal)

Thanks for checking, committed.

Now let's get to the more serious part.
Hiltjo observed that %Z and %z produce wrong results.

First of all, neither POSIX nor XPG define tm_gmtoff nor %Z nor %z:

  https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/time.h.html
  https://pubs.opengroup.org/onlinepubs/9699919799/functions/strptime.html

So i think the best way to find out what tm_gmtoff should be is to
understand how programs in our own tree use it.

Here is a (hopefully) complete list of the users in OpenBSD base.
The following files expect it to contain seconds:
 - lib/libc/time/strftime.c
 - lib/libc/time/wcsftime.c
 - usr.sbin/smtpd/to.c
 - usr.sbin/snmpd/mib.c
 - usr.sbin/cron/cron.c
 - usr.bin/ftp/util.c
 - usr.bin/cvs/entries.c
 - usr.bin/rcs/rcstime.c
 - gnu/usr.bin/perl/time64.c

I failed to find any users that do *not* expect seconds.
So my conclusion is that the documentation is right and
what the code in strptime.c does is wrong.

Here is a patch to fix the code.

OK?


The change to %Z is exactly what Hiltjo sent.
The current code for %z is unnecessarily complicated.
Rather than fixing it, i simply rewrote it from scratch.
I like it when a bugfix results in -28 +11 LOC and better readability.

Yours,
  Ingo


Index: strptime.c
===
RCS file: /cvs/src/lib/libc/time/strptime.c,v
retrieving revision 1.26
diff -u -p -r1.26 strptime.c
--- strptime.c  10 May 2019 12:49:16 -  1.26
+++ strptime.c  10 May 2019 14:40:49 -
@@ -497,7 +497,7 @@ literal:
ep = _find_string(bp, , nast, NULL, 4);
if (ep != NULL) {
 #ifdef TM_GMTOFF
-   tm->TM_GMTOFF = -5 - i;
+   tm->TM_GMTOFF = (-5 - i) * SECSPERHOUR;
 #endif
 #ifdef TM_ZONE
tm->TM_ZONE = (char *)nast[i];
@@ -509,7 +509,7 @@ literal:
if (ep != NULL) {
tm->tm_isdst = 1;
 #ifdef TM_GMTOFF
-   tm->TM_GMTOFF = -4 - i;
+   tm->TM_GMTOFF = (-4 - i) * SECSPERHOUR;
 #endif
 #ifdef TM_ZONE
tm->TM_ZONE = (char *)nadt[i];
@@ -519,33 +519,16 @@ literal:
}
return NULL;
}
-   offs = 0;
-   for (i = 0; i < 4; ) {
-   if (isdigit(*bp)) {
-   offs = offs * 10 + (*bp++ - '0');
-   i++;
-   continue;
-   }
-   if (i == 2 && *bp == ':') {
-   bp++;
-   continue;
-   }
-   break;
-   }
-   switch (i) {
-   case 2:
-   offs *= 100;
-   break;
-   case 4:
-   i = offs % 100;
-   if (i >= 60)
-   return NULL;
-   /* Convert minutes into decimal */
-   offs = (offs / 100) * 100 + (i * 50) / 30;
-   break;
-   default:
+   if (!isdigit(bp[0]) || !isdigit(bp[1]))
return NULL;
-   }
+   offs = ((bp[0]-'0') * 10 + (bp[1]-'0')) * SECSPERHOUR;
+   bp += 2;
+   if (*bp == ':')
+   bp++;
+   if (!isdigit(bp[0]) || !isdigit(bp[1]))
+   return NULL;
+   offs += ((bp[0]-'0') * 10 + (bp[1]-'0')) * SECSPERMIN;
+   bp += 2;
if (neg)
offs = -offs;
tm->tm_isdst = 0;   /* XXX */



Re: mdoc.7: fix Dd example

2019-05-21 Thread Ingo Schwarze
Hi Klemens,

Jason McIntyre wrote on Sun, May 19, 2019 at 08:58:35PM +0100:
> On Sun, May 19, 2019 at 09:52:39PM +0200, Klemens Nanni wrote:
>> On Sun, May 19, 2019 at 07:31:19PM +0100, Jason McIntyre wrote:

>>> no. just use $Mdocdate$ and it gets expanded on commit. just like
>>> when you add $OpenBSD$: you don;t fill that in, right?

>> OK, will do so.  No, I'll just wite `$OpenBSD$'.

>>> use $Mdocdate$ and let expansion happen, or just use the
>>> old format of "month day, year".
>>> is that clear, or did the docs mislead you somehow?

>> Now it is, thanks.  I had no idea of old vs. new, so it confused me
>> to see warnings in both variations.

Thanks for the report.  It was indeed a bad idea that mandoc(1)
complained about style when your style was actually perfectly fine,
so i just committed the patch below.

>> To be clear: what is the "new" format?

> originally we wrote the date every time we bumped the page, in the
> format:
> 
>   month day, year
> 
> now you just add an Mdocdate id before *first* commit, and thereafter
> there's no need to touch it. it updates every commit. like any other
> rcsid.

>> I think I also misunderstood the `$Mdocdate$ | month day, year' syntax.
>> This means either `$Mdocdate$' *or* `May 19, 2019`, not both together
>> like I did.

> correct.

I also checked the mdoc(7) manual page and came to the conclusion
that the description of .Dd is already very good.  MANUAL STRUCTURE
shows the canonical form for a new page.  MACRO OVERVIEW correctly
shows all forms that authors are supposed to type in.  MACRO REFERENCE
describes the details correctly and provides a perfect set of
examples.

So it seems to me the bad message was the only problem here.

Yours,
  Ingo


Log Message:
---
Do not print the style message "missing date" when the date is given
as "$Mdocdate$" without an actual date.  That is the canonical way to
write a new manual page and not bad style at all.
Misleading message reported by kn@ on tech@.

Modified Files:
--
mandoc:
mandoc.c

Revision Data
-
Index: mandoc.c
===
RCS file: /home/cvs/mandoc/mandoc/mandoc.c,v
retrieving revision 1.114
retrieving revision 1.115
diff -Lmandoc.c -Lmandoc.c -u -p -r1.114 -r1.115
--- mandoc.c
+++ mandoc.c
@@ -541,10 +541,10 @@ mandoc_normdate(struct roff_man *man, ch
 
/* No date specified: use today's date. */
 
-   if (in == NULL || *in == '\0' || strcmp(in, "$" "Mdocdate$") == 0) {
+   if (in == NULL || *in == '\0')
mandoc_msg(MANDOCERR_DATE_MISSING, ln, pos, NULL);
+   if (in == NULL || *in == '\0' || strcmp(in, "$" "Mdocdate$") == 0)
return time2a(time(NULL));
-   }
 
/* Valid mdoc(7) date format. */
 



Re: print regress results

2019-05-15 Thread Ingo Schwarze
Hi Alexander,

Alexander Bluhm wrote on Wed, May 15, 2019 at 12:01:00AM -0400:

> Regress prints FAILED in the middle of the make output, this is
> hard to watch.

I agree this is a nuisance.  I have often wondered whether the
result was "PASS" or "FAIL" after doing longer regression runs
in the past.

> tb@ asked me to print a PASSED at the end.

That would indeed be an improvement.

> As the make processes cannot hold state over several targets or
> directories, I create a regress log.  It is placed in the top
> directory where you invoke make regress.  All failures are logged
> there.  if there are none, PASSED is printed as final output.

That concept feels somewhat messy and fragile.  The /^FAIL/ regular
expression might misfire - at least test writers must be aware that
tests must not frint FAIL at the beginning of output lines, so maybe
you should document this?

In principle, it would be cleaner to use the exit status of recursive
make invocations.  Then again, there are multiple places where make
is invoked recursively, so i'm not sure that is feasible.  If it isn't,
or if a cleaner way would cause excessive complication, i'm not
totally opposed to grepping log files, even though i don't particularly
like it.


Apart from the general approach, i'm not convinced the details are
quite right just yet.  For starters, it feels even more messy that
the REGRESS_LOG variable invades bsd.subdir.mk - not sure whether
that can somehow be avoided.

Here are some examples of what i consider undesirable side effects:

   $ cd /usr/src/bin/
   $ make -n obj
  rm -f -- /usr/src/bin/regress.log
  [...]

I'm not convinced that "make obj" should attempt to change anything
in the source tree, apart from creating "obj" symlinks.  In particular,
having "make obj" delete anything doesn't feel quite right.

   $ make -n
  rm -f -- /usr/src/bin/regress.log
  [...]

While i see the point of deleting regress.log at the beginning
of a regression run, it doesn't feel quite right at the beginning
of a make run that has nothing to do with regression testing.

Also, it doesn't feel quite right that the regress.log file is
deleted from the *src* directory.  Shouldn't it be deleted from
the *obj* directory instead?

I mean, like this:

   $ cd /usr/src/regress/usr.bin/mandoc/mdoc/Ad/ 
   $ make -n regress 
  rm -f -- /usr/src/regress/usr.bin/mandoc/mdoc/Ad/obj/regress.log 
  [...]

Then, what about this:

   $ make -dl regress
  mandoc  -Ios=OpenBSD -Tascii [...]
  diff -au [...]
  [...]
  PASSED
   $ make -dl clean
  rm -f  noarg.mandoc_ascii font.mandoc_ascii
  rm -f  noarg.in_man noarg.mandoc_man font.in_man font.mandoc_man
  rm -f  noarg.mandoc_lint
  rm -f noarg.mandoc_markdown font.mandoc_markdown
   $ ls obj/
  regress.log
   $ make cleandir
   $ ls obj/
  regress.log

I would expect that both "make clean" and "make cleandir" remove
the file regress.log, but they don't appear to.

Evem if i remove "obj" and run "make regress" without running "make
obj" before it, the file regress.log survives "make clean" and "make
cleandir", this time not even below obj, but in the source tree
itself.  (Et ceterum censeo support for running without "make obj"
esse delendam to simplify stuff, but that's a different matter.)

   $ cd /usr/src/regress/usr.bin/mandoc/tbl/
   $ make -dl regress
  rm -f -- /usr/src/regress/usr.bin/mandoc/tbl/regress.log
  for entry in opt layout mod data macro; do [...]
  ===> opt
  diff -au [...]
  [...]
  PASSED
  ===> layout
  diff -au [...]
  [...]
  PASSED
  [...]
  ===> macro
  diff -au [...]
  [...]
  PASSED
   $ vi opt/box.out_ascii   # bad guy manipulating the output
   $ make -dl regress
  rm -f -- /usr/src/regress/usr.bin/mandoc/tbl/regress.log
  for entry in opt layout mod data macro; do [...]
  ===> opt
  diff -au [...]
  *** Error 1 in opt (../../Makefile.inc:82 'box.diff_ascii')
  FAILED
  *** Error 1 in target 'regress' (ignored)
  diff -au [...]
  [...]
  FAIL usr.bin/mandoc/tbl/opt box
  ===> layout
  diff -au [...]
  [...]
  FAIL usr.bin/mandoc/tbl/opt box
  [...]
  ===> macro
  diff -au [...]
  [...]
  FAIL usr.bin/mandoc/tbl/opt box

So in a recursive run, the ^FAIL lines are re-printed for each
subsequent directory.  Isn't the goal to print the summary once,
at the end?

Also, the file regress.log is created in the source tree in this
case, even though i ran "make obj" before.  Not sure how to prevent
that because usr.bin/mandoc/tbl/ neither has nor needs an obj/
symlink because in contains only subdirs, no tests of its own.
All the same, it doesn't feel quite right.

These are not strong objections, you are maintaining this subsystem,
but i have some doubts whether this particular patch is mature
enough for commit.

Yours,
  Ingo



Re: remove regress maxtime

2019-05-14 Thread Ingo Schwarze
Hi Alexander,

Alexander Bluhm wrote on Tue, May 14, 2019 at 02:16:19AM -0400:

> Does anyone use the REGRESS_MAXTIME feature?

Not me.

> I would like to remove it.
> 
> - The timeout based on CPU seconds is pretty useless.
>   Most hanging tests sleep and do not spin.
> - A timeout cannot be distinguished from failure.
> - I have never triggered it.
> - Only 3 tests set it.
> - It comes in my way for a new fail/pass reporting feature.

I like the deletion.

Getting rid of features that work poorly, are fragile, are rarely used
and get in the way of maintenance and development is good.
Simplicity is good, and having one robust way of doing things rather
than some rarely-tested options is good.

Please delete the "BUGS AND LIMITATIONS" section in bsd.regress.mk(5),
too, which had a wrong name, anyway.

> Also do _SKIP_FAIL cleanup.

Sure, why not.

Yours,
  Ingo


> Index: regress/lib/libpthread/Makefile.inc
> ===
> RCS file: /data/mirror/openbsd/cvs/src/regress/lib/libpthread/Makefile.inc,v
> retrieving revision 1.10
> diff -u -p -r1.10 Makefile.inc
> --- regress/lib/libpthread/Makefile.inc   19 Aug 2012 18:55:16 -  
> 1.10
> +++ regress/lib/libpthread/Makefile.inc   14 May 2019 05:56:28 -
> @@ -10,5 +10,3 @@ CFLAGS+=-Wall # -Werror
>  #DEBUG=  -ggdb
>  CFLAGS+= -DSRCDIR='"${.CURDIR}"'
>  CFLAGS+= -I${.CURDIR}/../include
> -
> -REGRESS_MAXTIME?=30
> Index: regress/lib/libpthread/pthread_specific/Makefile
> ===
> RCS file: 
> /data/mirror/openbsd/cvs/src/regress/lib/libpthread/pthread_specific/Makefile,v
> retrieving revision 1.2
> diff -u -p -r1.2 Makefile
> --- regress/lib/libpthread/pthread_specific/Makefile  22 Dec 2013 11:08:31 
> -  1.2
> +++ regress/lib/libpthread/pthread_specific/Makefile  14 May 2019 05:56:28 
> -
> @@ -2,6 +2,4 @@
> 
>  PROG=pthread_specific
> 
> -REGRESS_MAXTIME=60
> -
>  .include 
> Index: regress/lib/libpthread/stdarg/Makefile
> ===
> RCS file: 
> /data/mirror/openbsd/cvs/src/regress/lib/libpthread/stdarg/Makefile,v
> retrieving revision 1.5
> diff -u -p -r1.5 Makefile
> --- regress/lib/libpthread/stdarg/Makefile26 Dec 2013 16:22:55 -  
> 1.5
> +++ regress/lib/libpthread/stdarg/Makefile14 May 2019 05:56:28 -
> @@ -4,6 +4,4 @@ PROG= stdarg
> 
>  CFLAGS+= -I${.CURDIR}/../include
> 
> -REGRESS_MAXTIME=10
> -
>  .include 
> Index: share/man/man5/bsd.regress.mk.5
> ===
> RCS file: /data/mirror/openbsd/cvs/src/share/man/man5/bsd.regress.mk.5,v
> retrieving revision 1.18
> diff -u -p -r1.18 bsd.regress.mk.5
> --- share/man/man5/bsd.regress.mk.5   2 Dec 2018 11:46:31 -   1.18
> +++ share/man/man5/bsd.regress.mk.5   14 May 2019 05:33:37 -
> @@ -80,9 +80,6 @@ Points to the fully-qualified path of a
>  results are appended.
>  Defaults to
>  .Pa /dev/null .
> -.It Ev REGRESS_MAXTIME
> -Maximum limit of CPU seconds to spend on the regression test.
> -Exceeding this time will result in a failure being logged.
>  .It Ev REGRESS_ROOT_TARGETS
>  Targets for which root access is required to run the test.
>  The
> Index: share/mk/bsd.regress.mk
> ===
> RCS file: /data/mirror/openbsd/cvs/src/share/mk/bsd.regress.mk,v
> retrieving revision 1.17
> diff -u -p -r1.17 bsd.regress.mk
> --- share/mk/bsd.regress.mk   3 Dec 2018 22:30:04 -   1.17
> +++ share/mk/bsd.regress.mk   14 May 2019 05:43:43 -
> @@ -39,14 +39,14 @@ REGRESS_SKIP_TARGETS=run-regress-${PROG}
>  .  endif
>  .endif
> 
> -.if defined(REGRESS_SLOW_TARGETS) && ${REGRESS_SKIP_SLOW} != no
> +.if defined(REGRESS_SLOW_TARGETS) && ${REGRESS_SKIP_SLOW:L} != no
>  REGRESS_SKIP_TARGETS+=${REGRESS_SLOW_TARGETS}
>  .endif
> 
> -.if ${REGRESS_FAIL_EARLY} != no
> -_SKIP_FAIL=
> +.if ${REGRESS_FAIL_EARLY:L} != no
> +_REGRESS_IGNORE_FAIL=
>  .else
> -_SKIP_FAIL=-
> +_REGRESS_IGNORE_FAIL=-
>  .endif
> 
>  .if defined(REGRESS_ROOT_TARGETS)
> @@ -102,26 +102,13 @@ regress: .SILENT
>   @echo -n "SKIP " ${_REGRESS_OUT}
>   @echo SKIPPED
>  .  else
> -# XXX - we need a better method to see if a test fails due to timeout or just
> -#   normal failure.
> -.   if !defined(REGRESS_MAXTIME)
> - ${_SKIP_FAIL}if cd ${.CURDIR} && ${MAKE} ${RT}; then \
> + ${_REGRESS_IGNORE_FAIL} if ${MAKE} -C ${.CURDIR} ${RT}; then \
>   echo -n "SUCCESS " ${_REGRESS_OUT} ; \
>   else \
>   echo -n "FAIL " ${_REGRESS_OUT} ; \
>   echo FAILED ; \
>   false; \
>   fi
> -.   else
> - ${_SKIP_FAIL}if cd ${.CURDIR} && \
> - (ulimit -t ${REGRESS_MAXTIME} ; ${MAKE} ${RT}); then \
> - echo -n "SUCCESS " ${_REGRESS_OUT} ; \
> - else \
> - echo -n "FAIL (possible timeout) " 

Re: Don't use '.Sx' to refer to external sections

2019-05-14 Thread Ingo Schwarze
Hi Stephen,

Stephen Gregoratto wrote on Tue, May 14, 2019 at 01:08:00PM +1000:

> It was mentioned previously that the use of .Sx to refer to a section in
> a different manual page is incorrect. I grepped through the src tree to
> see if there were any cases of this and I found a couple.

Thanks, i committed the relevant parts of the patch.

Try not to mix unrelated, needless changes into patches,
though; i did not commit the parts cited below.

Yours,
  Ingo


> diff --git a/share/man/man4/ip6.4 b/share/man/man4/ip6.4
> index 8acb7a762..a40b78f3a 100644
> --- a/share/man/man4/ip6.4
> +++ b/share/man/man4/ip6.4
> @@ -94,7 +94,8 @@ struct ip6_hdr {
>  All fields are in network-byte order.
>  Any options specified (see
>  .Sx Options
> -below) must also be specified in network-byte order.
> +below)
> +must also be specified in network-byte order.
>  .Pp
>  .Va ip6_flow
>  specifies the flow ID.

That change provides no benefit.

> diff --git a/share/man/man9/syscall.9 b/share/man/man9/syscall.9
> index 6a20b052e..1eb78609f 100644
> --- a/share/man/man9/syscall.9
> +++ b/share/man/man9/syscall.9
> @@ -175,9 +175,9 @@ else
>  .Pp
>  The
>  .Dq SYSCALL_DEBUG
> -parts of the code are explained in the section
> +parts of the code are explained in the
>  .Sx Debugging
> -later in the document.
> +section below.
>  For the
>  .Dq KTRACE
>  portions of the code refer to the

I committed that one separately.

> diff --git a/usr.sbin/vmctl/vmctl.8 b/usr.sbin/vmctl/vmctl.8
> index 144d774c8..0a7195a3c 100644
> --- a/usr.sbin/vmctl/vmctl.8
> +++ b/usr.sbin/vmctl/vmctl.8
> @@ -208,8 +208,8 @@ configure a gateway address on the VM host side,
>  and run a simple DHCP/BOOTP server for the VM.
>  See
>  .Sx LOCAL INTERFACES
> -below for more information on how addresses are calculated and assigned when
> -using the
> +below for more information on how addresses are calculated and assigned
> +when using the
>  .Fl L
>  option.
>  .It Fl m Ar size

That change provides no benefit.



clean up LC_COLLATE and LC_CTYPE in sort(1)

2019-05-14 Thread Ingo Schwarze
Hi,

after my LC_NUMERIC cleanup for sort(1) went in (thanks to tb@ for
the review), i'd like to adress the rest of locale dependency.

Large amounts of extremely ugly code in sort(1) - many hundreds of
lines - deal with LC_COLLATE, which we don't support now and have
no intention to support in the future.

The code is very repetitive and currently written to handle three cases:

 1. byte_sort == true && sort_mb_cur_max == 1

That is the only mode currently supported on OpenBSD.
It means everything uses the POSIX locale and ASCII.

 2. byte_sort == false && sort_mb_cur_max == 1

That will never be supported on OpenBSD.
It handles 8-bit single byte character encodings which are
incompatible with UTF-8, for example ISO-LATIN-1.

 3. byte_sort == false && sort_mb_cur_max > 1

Even though i doubt we will ever do it, that could theoretically
happen on OpenBSD in the remote future, if we ever choose to
implement collation support for UTF-8 locales.

Handling case 3 would be a massive undertaking - not just a matter
of improving Unicode support, but also forcing us to maintain many
different UTF-8 locales for many different languages, which means
extremely messy stuff invading the C library.  During the Belgrade
EuroBSDCon a few years ago, i talked to Baptiste Daroussin who had
just implemented LC_COLLATE in FreeBSD libc and who was utterly
scared by the complexity.  Knowing ourselves, we would be scared
even more once we got there.  So it will definitely not happen
quickly.  Then again, ruling that out for good is maybe not a
decision to make in this particular patch.

Consequently, the byte_sort variable can be deleted immediately,
killing case 2 for good, but i'm keeping the sort_mb_cur_max variable
as a global constant for now, even though more than half of the
code it controls is currently dead code.

Since none of our single-byte character and string functions are
locale dependent, we can also zap LC_CTYPE while here.

After committing this patch, i shall re-indent bwscoll() properly
in a separate commit, but i'm not including that in the patch sent
out here because it would make the patch unreadable.

OK?
  Ingo


Index: bwstring.c
===
RCS file: /cvs/src/usr.bin/sort/bwstring.c,v
retrieving revision 1.7
diff -u -p -r1.7 bwstring.c
--- bwstring.c  1 Apr 2015 22:38:08 -   1.7
+++ bwstring.c  14 May 2019 15:19:54 -
@@ -40,9 +40,6 @@
 #include "bwstring.h"
 #include "sort.h"
 
-bool byte_sort;
-size_t sort_mb_cur_max = 1;
-
 static wchar_t **wmonths;
 static char **cmonths;
 
@@ -686,22 +683,20 @@ bwscoll(const struct bwstring *bws1, con
 
if (len1 <= offset)
return (len2 <= offset) ? 0 : -1;
-   else {
+
if (len2 <= offset)
return 1;
-   else {
+
len1 -= offset;
len2 -= offset;
 
if (sort_mb_cur_max == 1) {
const unsigned char *s1, *s2;
+   int res;
 
s1 = bws1->data.cstr + offset;
s2 = bws2->data.cstr + offset;
 
-   if (byte_sort) {
-   int res = 0;
-
if (len1 > len2) {
res = memcmp(s1, s2, len2);
if (!res)
@@ -714,66 +709,6 @@ bwscoll(const struct bwstring *bws1, con
res = memcmp(s1, s2, len1);
 
return res;
-
-   } else {
-   int res = 0;
-   size_t i, maxlen;
-
-   i = 0;
-   maxlen = len1;
-
-   if (maxlen > len2)
-   maxlen = len2;
-
-   while (i < maxlen) {
-   /* goto next non-zero part: */
-   while ((i < maxlen) &&
-   !s1[i] && !s2[i])
-   ++i;
-
-   if (i >= maxlen)
-   break;
-
-   if (s1[i] == 0) {
-   if (s2[i] == 0)
-   /* NOTREACHED */
-   err(2, "bwscoll 
error 01");
-   else
-   

Re: update to PF pfctl(8) and pf.conf(5) manpages

2019-04-30 Thread Ingo Schwarze
Hi Alexandr,

here are a few additional minor remarks...

Alexandr Nedvedicky wrote on Mon, Apr 29, 2019 at 08:59:55PM +0200:

> diff --git a/sbin/pfctl/pfctl.8 b/sbin/pfctl/pfctl.8
> index b7e941991ba..5e2c57f6bc2 100644
> --- a/sbin/pfctl/pfctl.8
> +++ b/sbin/pfctl/pfctl.8
> @@ -198,7 +198,11 @@ Flush the tables.
>  .It Fl F Cm osfp
>  Flush the passive operating system fingerprints.
>  .It Fl F Cm Reset
> -Reset limits, timeouts and options back to default settings.
> +Reset limits, timeouts and other options back to default settings.
> +See
> +.Xr pf.conf 5
> +.Sx 'OPTIONS'
> +for details.

I know that jmc@ suggested this, but unfortunately, it's incorrect.
You cannot use .Sx to refer to a section in a *different* manual page.
What the above code creates is a link to a section in the pfctl(8)
manual page itself that would have to start with

  .Sh 'OPTIONS'

and that doesn't exist, so it results in a dead link.

Besides, kn@ already mentioned that combining .Sx with quoting is
not a good idea.

I would simply write:

  See the OPTIONS section in
  .Xr pf.conf 5
  for details.

> diff --git a/share/man/man5/pf.conf.5 b/share/man/man5/pf.conf.5
> index 247ceef40a5..b043a6ed725 100644
> --- a/share/man/man5/pf.conf.5
> +++ b/share/man/man5/pf.conf.5
[...]
> @@ -1235,6 +1245,25 @@ Various limits can be combined on a single line:
>  .Bd -literal -offset indent
>  set limit { states 2, frags 2000, src-nodes 2000 }
>  .Ed
> +.Pp
> +.Xr pf 4
> +has the following defaults
> +.Bl -column table-entries PFR_KENTRY_HIWAT_SMALL platform_dependent
> +.It states Ta Dv PFSTATE_HIWAT Ta Pq 10
> +.It tables Ta Dv PFR_KTABLE_HIWAT Ta Pq 1000
> +.It table-entries Ta Dv PFR_KENTRY_HIWAT Ta Pq 20
> +.It table-entries Ta Dv PFR_KENTRY_HIWAT_SMALL Ta Pq 10
> +.It frags Ta Dv NMBCLUSTERS/32 Ta Pq platform dependent
> +.El
> +.Pp
> +.Cm NMBCLUSTERS
> +defines the total number of packets which can exist in-system at any one 
> time.
> +Refer to
> +.Sy sys/arch/`uname -p`/param.h

I'm still suggesting

  .In machine/param.h

It's easier for users to refer to installed header files than to
search the kernel source tree.

[...]
> @@ -1290,6 +1329,10 @@ instead of being dropped;
>  the reassembled packet will have the
>  .Dq dont-fragment
>  bit cleared.
> +The default value is yes.

  The default value is
  .Cm yes .

[...]
> @@ -1388,15 +1435,20 @@ The thresholds for entering and leaving syncookie 
> mode can be specified using:
>  set syncookies adaptive (start 25%, end 12%)
>  .Ed
>  .El
> +.Pp
> +.Xr pfctl 8
> +.Fl F Cm Reset
> +restores customized setting back to default.
>  .It Ic set Cm timeout Ar variable value
>  .Bl -tag -width "src.track" -compact
>  .It Cm frag
> -Seconds before an unassembled fragment is expired.
> +Seconds before an unassembled fragment is expired (60 by default).
>  .It Cm interval
> -Interval between purging expired states and fragments.
> +Interval between purging expired states and fragments (10 by default).

Maybe "(10 seconds by default)"?
Just in this one case, the others seem fine.

[...]
> @@ -1408,13 +1460,13 @@ Tuning these values may improve the performance of the
>  firewall at the risk of dropping valid idle connections.
>  .Pp
>  .Bl -tag -width Ds -compact
> -.It Cm tcp.closed
> +.It Cm tcp.closed No (90 seconds by default)

  .It Cm tcp.closed Pq 90 seconds by default

would seem simpler and more natural to me and is shorter.
Not a big deal though, your usage of .No isn't incorrect.

Yours,
  Ingo



Re: [PATCH] [www] innovations.html - add unwind(8) to the list

2019-04-30 Thread Ingo Schwarze
Hi Raf,

Raf Czlonka wrote on Tue, Apr 30, 2019 at 12:18:17PM +0100:

> unwind(8) looks like a good candidate for inclusion into
> innovations.html[0].
> 
> While there, add "released with ..." to the preceding entry.
> 
> [0] https://www.openbsd.org/innovations.html

Thanks, committed.
  Ingo


> Index: innovations.html
> ===
> RCS file: /cvs/www/innovations.html,v
> retrieving revision 1.72
> diff -u -p -r1.72 innovations.html
> --- innovations.html  11 Apr 2019 01:14:07 -  1.72
> +++ innovations.html  30 Apr 2019 11:16:32 -
> @@ -804,7 +804,12 @@
>  
>  https://man.openbsd.org/rad.8;>rad(8):
>  Written and maintained by Florian Obser.
> -Imported July 10, 2018.
> +Imported July 10, 2018; released with OpenBSD 6.4.
> +
> +
> +https://man.openbsd.org/unwind.8;>unwind(8):
> +Written and maintained by Florian Obser.
> +Imported January 23, 2019; released with OpenBSD 6.5.
>  
>  
>  



Re: update to PF pfctl(8) and pf.conf(5) manpages

2019-04-30 Thread Ingo Schwarze
Hi,

Jason McIntyre wrote on Mon, Apr 29, 2019 at 09:53:03PM +0100:

> ah, so singular is correct. but it needs an article of some sort. how
> about:
> 
>   .Xr pfctl 8
>   .Fl F Cm Reset
>   restores this value to its default.
> 
> to be honest, i don;t like it when we Xr like this, or mark up so much.
> i would go for
> 
>   .Dq pfctl -F Reset
>   restores this value to its default.

I'm OK with both forms.

Note that there is value in having "Cm Reset" in at least one place
in pf.conf.5, such that the following commands bring up pf.conf(5):

   $ man -k Cm=Reset
   $ man -k any~Reset

> or .Li instead of .Dq maybe

Please don't.  We just deprecated .Li (to simplify the language,
such that authors need to learn fewer macros) and mdoc(7) now says:

   Li word ...
 Request a typewriter (literal) font.  Deprecated because on terminal
 output devices, this is usually indistinguishable from normal text.
 For literal displays, use Ql (in-line), Dl (single line), or Bd
 -literal (multi-line) instead.

Typewriter font isn't critical in such a case, so either .Ql or .Dq
is fine (or the full markup with Xr Fl Cm).

 +.Xr pfctl 8
 +.Fl F Cm Reset
 +restores customized settings back to default.
 +.sp

>>> what's the .sp for? if you want vertical blank space, use .Pp

>> I think it was suggested by 'mandoc -T lint',

I suspect you have seen the message
  "blank line in fill mode, using .sp"
with an earlier version of your patch.  That message does *not*
intend to ask you to insert an .sp request (you should never use
.sp in a manual page, it is low level roff) - it merely explains
how a blank line (which shouldn't be there) is treated by mandoc.
I just improved the description of the message in the mandoc(1)
manual page to avoid the misunderstanding you succumbed to:

  Warnings related to plain text
blank line in fill mode, using .sp
  (mdoc) The meaning of blank input lines is only well-defined
  in non-fill mode: In fill mode, line breaks of text input
  lines are not supposed to be significant.  However, for
  compatibility with groff, blank lines in fill mode are formatted
  like sp requests.  To request a paragraph break, use Pp instead
  of a blank line.

> ok, .sp, which i always forget about, is basically roff for .Pp. just
> use .Pp if you need the vertical spacing.

Correct.

Right before .It in a non-compat list, it's redundant, though.

  (tcp.first 60, tcp.established 43200).
 +.Pp
 +.Xr pfctl 8
 +.Fl F Cm Reset
 +restores customized any timeout setting back to default.

>>> this reads weird!

>> I've fixed that to:
>>  restores any customized timeout setting back to default.

> maybe
>   restores this setting to its default value.

The sentence applies to the whole, long "set timeout variable value"
list item, so plural (settings) is needed.  Maybe:

   restores all
   .Cm timeout
   settings to their respective defaults.

Yours,
  Ingo



less(1) UTF-8 cleanup: pshift()

2019-05-07 Thread Ingo Schwarze
Hi,

thanks for checking the pappend() and filename.c patches,
those are now committed.

Here is basic cleanup of the last major function in line.c, pshift().
Several minor issues still remain in the file, but those are for later.

This gets rid of two LWCHAR variables, one call to utf_len(),
get_wchar(), is_composing_char(), is_combining_char(), and
control_char() each and two calls to is_wide_char().

Note the change from "continue" to "break" in the "width == 2"
block.  It is an improvement because the double-width character did
not move off-screen completely, its second half is still there,
represented as a single-width space.  So whatever follows it does
absolutely not move off-screen, not even if what follows is zero
width, so it must not be inspected by the loop.

Since the variable names are far from intuitive, improve and add
comments.  Also, the comment above pshift() is simply wrong.
The function does not discard a specific number of characters.


Tested by running

  LC_CTYPE=en_US.UTF-8 less -# 1
  LC_CTYPE=en_US.UTF-8 less -# 1 -R
  LC_CTYPE=C less -# 1

on my set of test files containing a wide variety of combinations
of valid and invalid UTF-8 and control sequences, then scrolling
horizontally with the arrow keys.

OK?
  Ingo


Index: line.c
===
RCS file: /cvs/src/usr.bin/less/line.c,v
retrieving revision 1.29
diff -u -p -r1.29 line.c
--- line.c  7 May 2019 14:16:16 -   1.29
+++ line.c  7 May 2019 20:52:26 -
@@ -32,11 +32,11 @@ int ntabstops = 1;  /* Number of tabstop
 int tabdefault = 8;/* Default repeated tabstops */
 off_t highest_hilite;  /* Pos of last hilite in file found so far */
 
-static int curr;   /* Index into linebuf */
-static int column; /* Printable length, accounting for backspaces, etc. */
+static int curr;   /* Total number of bytes in linebuf */
+static int column; /* Display columns needed to show linebuf */
 static int overstrike; /* Next char should overstrike previous char */
 static int is_null_line;   /* There is no current line */
-static int lmargin;/* Left margin */
+static int lmargin;/* Index in linebuf of start of content */
 static char pendc;
 static off_t pendpos;
 static char *end_ansi_chars;
@@ -202,20 +202,21 @@ plinenum(off_t pos)
 
 /*
  * Shift the input line left.
- * This means discarding N printable chars at the start of the buffer.
+ * Starting at lmargin, some bytes are discarded from the linebuf,
+ * until the number of display columns needed to show these bytes
+ * would exceed the argument.
  */
 static void
 pshift(int shift)
 {
-   LWCHAR prev_ch = 0;
-   unsigned char c;
-   int shifted = 0;
-   int to;
-   int from;
-   int len;
-   int width;
-   int prev_attr;
-   int next_attr;
+   int shifted = 0;  /* Number of display columns already discarded. */
+   int from; /* Index in linebuf of the current character. */
+   int to;   /* Index in linebuf to move this character to. */
+   int len;  /* Number of bytes in this character. */
+   int width = 0;/* Display columns needed for this character. */
+   int prev_attr;/* Attributes of the preceding character. */
+   int next_attr;/* Attributes of the following character. */
+   unsigned char c;  /* First byte of current character. */
 
if (shift > column - lmargin)
shift = column - lmargin;
@@ -241,44 +242,41 @@ pshift(int shift)
}
continue;
}
-
-   width = 0;
-
if (!IS_ASCII_OCTET(c) && utf_mode) {
-   /* Assumes well-formedness validation already done.  */
-   LWCHAR ch;
-
-   len = utf_len(c);
-   if (from + len > curr)
-   break;
-   ch = get_wchar(linebuf + from);
-   if (!is_composing_char(ch) &&
-   !is_combining_char(prev_ch, ch))
-   width = is_wide_char(ch) ? 2 : 1;
-   prev_ch = ch;
+   wchar_t ch;
+   /*
+* Before this point, UTF-8 validity was already
+* checked, but for additional safety, treat
+* invalid bytes as single-width characters
+* if they ever make it here.  Similarly, treat
+* non-printable characters as width 1.
+*/
+   len = mbtowc(, linebuf + from, curr - from);
+   if (len == -1)
+   len = width = 1;
+   else if ((width = wcwidth(ch)) == -1)
+   width 

clean up LC_NUMERIC in sort(1)

2019-05-06 Thread Ingo Schwarze
Hi,

it was noticed years ago that our implementation of sort(1) is very
messy.  It contains huge amounts of dead code -- many hundreds of
lines -- and many wrapper functions around standard C library
functions that are in very bad taste and mostly pointless.  However,
nobody ever came round to cleaning it up.

As a first simple step, this patch deletes LC_NUMERIC support.
LC_NUMERIC is an absolutely terrible idea that we should never
support anywhere, neither now nor in the future.

The benefit is getting rid of

 * Four global variables (extern, not even file static).
 * Two static variables in sort.c.
 * One pointless wrapper function, conv_mbtowc().
 * Substantial parts of the code in set_locale().
 * One obscure environment variable.
 * Some so-called "debug" output that is always the same.

In order to not do too much in a single diff, this diff leaves
LC_NUMERIC and LC_CTYPE alone even though both are NOOPs on OpenBSD.
That's work for another day.

However, let's already drop LC_MESSAGES and LC_TIME from the
manual page together with LC_NUMERIC: they are of course NOOPs
and don't even seem to have dead tentacles in the code.

The test suite is still fine.

OK?
  Ingo


Index: coll.c
===
RCS file: /cvs/src/usr.bin/sort/coll.c,v
retrieving revision 1.11
diff -u -p -r1.11 coll.c
--- coll.c  11 Dec 2015 21:41:51 -  1.11
+++ coll.c  6 May 2019 11:59:44 -
@@ -46,12 +46,6 @@
 struct key_specs *keys;
 size_t keys_num = 0;
 
-wint_t symbol_decimal_point = L'.';
-/* there is no default thousands separator in collate rules: */
-wint_t symbol_thousands_sep = 0;
-wint_t symbol_negative_sign = L'-';
-wint_t symbol_positive_sign = L'+';
-
 static int wstrcoll(struct key_value *kv1, struct key_value *kv2, size_t 
offset);
 static int gnumcoll(struct key_value*, struct key_value *, size_t offset);
 static int monthcoll(struct key_value*, struct key_value *, size_t offset);
@@ -701,7 +695,7 @@ read_number(struct bwstring *s0, int *si
while (iswblank(bws_get_iter_value(s)))
s = bws_iterator_inc(s, 1);
 
-   if (bws_get_iter_value(s) == (wchar_t)symbol_negative_sign) {
+   if (bws_get_iter_value(s) == L'-') {
*sign = -1;
s = bws_iterator_inc(s, 1);
}
@@ -716,16 +710,13 @@ read_number(struct bwstring *s0, int *si
smain[*main_len] = bws_get_iter_value(s);
s = bws_iterator_inc(s, 1);
*main_len += 1;
-   } else if (symbol_thousands_sep &&
-   (bws_get_iter_value(s) == (wchar_t)symbol_thousands_sep))
-   s = bws_iterator_inc(s, 1);
-   else
+   } else
break;
}
 
smain[*main_len] = 0;
 
-   if (bws_get_iter_value(s) == (wchar_t)symbol_decimal_point) {
+   if (bws_get_iter_value(s) == L'.') {
s = bws_iterator_inc(s, 1);
while (iswdigit(bws_get_iter_value(s)) &&
*frac_len < MAX_NUM_SIZE) {
Index: coll.h
===
RCS file: /cvs/src/usr.bin/sort/coll.h,v
retrieving revision 1.1
diff -u -p -r1.1 coll.h
--- coll.h  17 Mar 2015 17:45:13 -  1.1
+++ coll.h  6 May 2019 11:59:44 -
@@ -123,14 +123,6 @@ typedef int (*listcoll_t)(struct sort_li
 extern struct key_specs *keys;
 extern size_t keys_num;
 
-/*
- * Main localised symbols. These must be wint_t as they may hold WEOF.
- */
-extern wint_t symbol_decimal_point;
-extern wint_t symbol_thousands_sep;
-extern wint_t symbol_negative_sign;
-extern wint_t symbol_positive_sign;
-
 /* funcs */
 
 cmpcoll_t get_sort_func(struct sort_mods *sm);
Index: sort.1
===
RCS file: /cvs/src/usr.bin/sort/sort.1,v
retrieving revision 1.58
diff -u -p -r1.58 sort.1
--- sort.1  24 Feb 2019 09:57:43 -  1.58
+++ sort.1  6 May 2019 11:59:44 -
@@ -177,10 +177,6 @@ Unknown strings are considered smaller t
 .It Fl n , Fl Fl numeric-sort , Fl Fl sort Ns = Ns Cm numeric
 An initial numeric string, consisting of optional blank space, optional
 minus sign, and zero or more digits (including decimal point)
-.\" with
-.\" optional radix character and thousands
-.\" separator
-.\" (as defined in the current locale),
 is sorted by arithmetic value.
 Leading blank characters are ignored.
 .It Fl R , Fl Fl random-sort , Fl Fl sort Ns = Ns Cm random
@@ -498,18 +494,6 @@ which has no
 equivalent.
 .Sh ENVIRONMENT
 .Bl -tag -width Fl
-.It Ev GNUSORT_NUMERIC_COMPATIBILITY
-If defined
-.Fl t
-will not override the locale numeric symbols, that is, thousand
-separators and decimal separators.
-By default, if we specify
-.Fl t
-with the same symbol as the thousand separator or decimal point,
-the symbol will be treated as the field separator.
-Older behavior was less definite: 

Re: perldoc: fix man output & formatting

2019-05-02 Thread Ingo Schwarze
Hi Andrew,

Andrew Daugherity wrote on Thu, May 02, 2019 at 10:53:37AM -0500:

> Also, their ToMan patch has a previously-included hunk
> for MANWIDTH=tty,

All that does is suppress a warning message "non-numeric MANWIDTH"
when a user has MANWIDTH=tty in their environment.  No idea why any
user would have that.  I see no reason to merge that part of the
patch to OpenBSD.

> but it doesn't seem to do anything for me?)

No, it doesn't do anything on OpenBSD because _get_columns() never
gets called when mandoc(1) is being used.

Then again, there is no real reason why _get_columns() isn't used
with mandoc: mandoc does support setting the output width.
Not via "-rLL=" though, that is a groffism.

If people want perldoc(1) to respect the MANWIDTH environment
variable, we could commit the following patch (and feed it upstream).

I don't really care either way.

Regarding the patch below, note that i'm deliberately not doing any
validation or truncation.  People who trust environment variables
enough to use them but not enough to use them without validation
seem silly to me.  On the one hand, the existing code in ToMan.pm
already does validation on MANWIDTH which does not need to be
repeated.  On the other hand, even if it wouldn't, why not leave
the validation to mandoc(1), which already validates the -O width=
argument?

In mandoc(1)/man(1) itself, i decided against supporting the MANWIDTH
variable.  In general, i hate environment variables.  They stealthily
change the behaviour of programs, making debugging harder and causing
needless headaches about security boundaries when leaking from one
process into its children.  In man(1) and mandoc(1), supporting
MANWIDTH would be pointless because you can already use the -O width=
option if you want to, which is clearer and less sneaky.

Yours,
  Ingo


Index: ToMan.pm
===
RCS file: /cvs/src/gnu/usr.bin/perl/cpan/Pod-Perldoc/lib/Pod/Perldoc/ToMan.pm,v
retrieving revision 1.8
diff -u -p -r1.8 ToMan.pm
--- ToMan.pm13 Feb 2019 21:15:14 -  1.8
+++ ToMan.pm3 May 2019 02:49:03 -
@@ -227,6 +227,10 @@ sub _collect_nroff_switches {
push @render_switches, '-rLL=' . (int $c) . 'n' if $cols > 80;
}
 
+   if( $self->_is_mandoc ) {
+   push @render_switches, '-Owidth=' . $self->_get_columns;
+   }
+
# I hear persistent reports that adding a -c switch to $render
# solves many people's problems.  But I also hear that some mans
# don't have a -c switch, so that unconditionally adding it here



Re: perldoc: fix man output & formatting

2019-05-02 Thread Ingo Schwarze
Hi Todd & Andrew,

Andrew Fresh wrote on Thu, May 02, 2019 at 09:53:29AM -0700:
> On Thu, May 02, 2019 at 10:21:15AM -0600, Todd C. Miller wrote:
>> On Thu, 02 May 2019 10:53:37 -0500, Andrew Daugherity wrote:

>>> I reported this to FreeBSD ports a couple months ago [2], and they
>>> provided a fix [3] which repairs the -oMan output, and makes that the
>>> default.  Their fix applies cleanly to the OpenBSD tree and works, but
>>> I have no idea why mandoc requires resetting $?.  (Also, their ToMan
>>> patch has a previously-included hunk for MANWIDTH=tty, but it doesn't
>>> seem to do anything for me?)

>> There problem is a missing waitpid() call, so $? is not actually
>> set.  Something like the following should be better, though I haven't
>> tested it yet.

> This seems to work fine for me, OK afresh1@

OK schwarze@, too.

> I submitted this upstream as well, so hopefully shouldn't have to keep
> the patch for too long.
> https://github.com/mrallen1/Pod-Perldoc/pull/39

Thanks.

> Still looking at defaulting to -oMan, but probably a good idea.

I agree in principle that -oMan ought to become the default.

Yours,
  Ingo


>> Index: gnu/usr.bin/perl/cpan/Pod-Perldoc/lib/Pod/Perldoc/ToMan.pm
>> ===
>> RCS file: 
>> /cvs/src/gnu/usr.bin/perl/cpan/Pod-Perldoc/lib/Pod/Perldoc/ToMan.pm,v
>> retrieving revision 1.8
>> diff -u -p -u -r1.8 ToMan.pm
>> --- gnu/usr.bin/perl/cpan/Pod-Perldoc/lib/Pod/Perldoc/ToMan.pm   13 Feb 
>> 2019 21:15:14 -  1.8
>> +++ gnu/usr.bin/perl/cpan/Pod-Perldoc/lib/Pod/Perldoc/ToMan.pm   2 May 
>> 2019 16:16:05 -
>> @@ -352,6 +352,9 @@ sub _filter_through_nroff {
>>  close $writer;
>>  $self->debug( "Done writing\n" );
>>  
>> +# wait for it to exit
>> +waitpid( $pid, 0 );
>> +
>>  # read any leftovers
>>  $done .= do { local $/; <$reader> };
>>  $self->debug( sprintf "Done reading. Output is %d bytes\n",
>> 


> I wish life had an UNDO function.

I didn't know you were *that* cruel.
All professional Chess and Go players are going to starve
because no game will ever get finished.



Re: kqueue.2: formatting fixes and minor HISTORY expansion

2019-05-02 Thread Ingo Schwarze
Hi Fabio,

Fabio Scotoni wrote on Thu, May 02, 2019 at 03:33:42PM +0200:

> I've taken a stab at improving kqueue.2 formatting.
> Most of the changes are markup fixes.

All your formatting decisions are good.

> I used ".Dv NULL" over plain "null" in accordance with
> lib/libc/stdlib/malloc.3 rev. 1.113.

You are right.
There isn't much wrong with using other manual pages as examples
of good formatting practices (the risk of picking bad ones is
relatively small in OpenBSD).

The authoritative sources are, in this case:

   $ LESS=+t man -O tag=Dv mdoc

   https://mandoc.bsd.lv/mdoc/appendix/markup.html
  (search for "null pointer")

> I also added a note to the HISTORY section that kqueue()/kevent() have
> been available in OpenBSD since 2.9;
> the wording matches growfs(8).

Yes, that is the standard wording.

> I'm not sure how to handle the undocumented EPERM that is returned if a
> pledge(2) does not include "proc" when an attempt is made to attach to a
> process with EVFILT_PROC.
> It does feel somewhat non-obvious, but is noted in neither pledge(2) nor
> kqueue(2).

Seems like a good question to me, but i'll not mix that into a mere
formatting cleanup commit.

Committed with minor tweaks, thanks.
  Ingo



Re: perldoc: fix man output & formatting

2019-05-05 Thread Ingo Schwarze
Hi Andrew,

Andrew Fresh wrote on Fri, May 03, 2019 at 06:24:16PM -0700:

> I committed this after moving the waitpid down a few lines, after the
> last read from the filehandle.

Oops.  Sorry for missing that, and thanks for committing it,
and for catching that additional issue.

> I will say that `perldoc -oterm unicook` looks a lot better with the
> right fonts and LC_CTYPE=en_US.UTF-8 than the output from -oman, but it
> looks a lot worse with LC_CTYPE=C so probably -oman is a better default.

Specifically,

  LC_CTYPE=en_US.UTF-8 LESS=-R perldoc -oterm  # produces UTF-8 (fine)
  LC_CTYPE=en_US.UTF-8 perldoc -oman   # produces UTF-8 (fine)
   # with the patch below

  LC_CTYPE=C LESS=-R perldoc -oterm unicook# produces UTF-8 (bad)
  LC_CTYPE=C perldoc -oman unicook # produces ASCII (good)

So -oterm is definitely a bad default, and fixing it may be a waste
of time.  I see no point in maintaining alternative output modes that
provide little to no benefit.

> Perhaps I can figure out how to get ToMan to do the right thing in the
> right locale someday.

The first chunk in the patch below seems to be all that is needed.

IIRC, we already do a similar thing in pod2man(1).

Of course, both chunks can be OK'ed / objected to independently,
they are completely independent of each other.  I'm merely leaving
both in place such that they don't get forgotten.

Yours,
  Ingo


Index: ToMan.pm
===
RCS file: /cvs/src/gnu/usr.bin/perl/cpan/Pod-Perldoc/lib/Pod/Perldoc/ToMan.pm,v
retrieving revision 1.9
diff -u -p -r1.9 ToMan.pm
--- ToMan.pm4 May 2019 01:14:34 -   1.9
+++ ToMan.pm5 May 2019 14:39:01 -
@@ -144,7 +144,9 @@ sub _get_podman_switches {
 #
 # See RT #77465
 #
-#push @switches, 'utf8' => 1;
+# Then again, do *not* comment it out on OpenBSD:
+# mandoc handles UTF-8 input just fine.
+push @switches, 'utf8' => 1;
 
$self->debug( "Pod::Man switches are [@switches]\n" );
 
@@ -225,6 +227,10 @@ sub _collect_nroff_switches {
my $c = $cols * 39 / 40;
$cols = $c > $cols - 2 ? $c : $cols -2;
push @render_switches, '-rLL=' . (int $c) . 'n' if $cols > 80;
+   }
+
+   if( $self->_is_mandoc ) {
+   push @render_switches, '-Owidth=' . $self->_get_columns;
}
 
# I hear persistent reports that adding a -c switch to $render



Re: perldoc: fix man output & formatting

2019-05-05 Thread Ingo Schwarze
Hi Andrew,

Andrew Fresh wrote on Sun, May 05, 2019 at 02:44:58PM -0700:
> On Sun, May 05, 2019 at 04:53:05PM +0200, Ingo Schwarze wrote:

>> The first chunk in the patch below seems to be all that is needed.
>> IIRC, we already do a similar thing in pod2man(1).

> This is true, we do similar in pod2man and so the same fix here seems
> correct.
> 
> When trying to push this upstream I will probably set it to 1 only if
> $self->_is_mandoc, and then someone else can figure out adding it if
> they're stuck on groff.

Right, let someone else figure out which ancient versions of groff
support UTF-8 in which ways...  :-)

Solaris 11 has 1.22 (2013), Solaris 9 has 1.21 (2009), and rumour
has it that MacOS X might still have 1.19 (2004/05)...

> (that is assuming that there's not some crappy mandoc that's popular
> on some strange system that perl wants to support,

I very much doubt that, unless some system uses mandoc that i
never heard about.  Look at https://mandoc.bsd.lv/ports.html :
The oldest mandoc still in use anywhere is 1.14.3, on Alpine and
on Ubuntu-oldstable Aardvark.  Oh wait, Debian-stable Stretch
still has 1.13.3, but even that should be just fine: according to
https://mandoc.bsd.lv/devhistory.html , direct UTF-8 input to mandoc
works since 1.13.2 (Dec. 2014).  Well, DragonFly still has 1.13.1,
but they are totally outdated anyway and while they do ship an
ancient mandoc, they don't actually use it for anything.

> but I will suggest it anyway.

Sounds like a safe bet.

> It does appear that the -Owidth patch does what Pod::Perldoc wants it
> to.  So, although I find it harder to read when it spans a wide
> terminal,

True, but using MANWIDTH and falling back to stty(1) is an upstream
user interface decision which we probably shouldn't interfere with.  
It has nothing to do with the rendering backend in use.

If people dislike the upstream default, they can use a narrower
window or "export MANWIDTH=80" or "stty columns 80".

> both of these are:
> OK afresh1@

Both committed, thanks for checking.
  Ingo


> > Index: ToMan.pm
> > ===
> > RCS file: 
> > /cvs/src/gnu/usr.bin/perl/cpan/Pod-Perldoc/lib/Pod/Perldoc/ToMan.pm,v
> > retrieving revision 1.9
> > diff -u -p -r1.9 ToMan.pm
> > --- ToMan.pm4 May 2019 01:14:34 -   1.9
> > +++ ToMan.pm5 May 2019 14:39:01 -
> > @@ -144,7 +144,9 @@ sub _get_podman_switches {
> >  #
> >  # See RT #77465
> >  #
> > -#push @switches, 'utf8' => 1;
> > +# Then again, do *not* comment it out on OpenBSD:
> > +# mandoc handles UTF-8 input just fine.
> > +push @switches, 'utf8' => 1;
> >  
> > $self->debug( "Pod::Man switches are [@switches]\n" );
> >  
> > @@ -225,6 +227,10 @@ sub _collect_nroff_switches {
> > my $c = $cols * 39 / 40;
> > $cols = $c > $cols - 2 ? $c : $cols -2;
> > push @render_switches, '-rLL=' . (int $c) . 'n' if $cols > 80;
> > +   }
> > +
> > +   if( $self->_is_mandoc ) {
> > +   push @render_switches, '-Owidth=' . $self->_get_columns;
> > }
> >  
> > # I hear persistent reports that adding a -c switch to $render



less(1) UTF-8 cleanup: cvt.c

2019-05-08 Thread Ingo Schwarze
Hi Todd,

Todd C. Miller wrote on Wed, May 08, 2019 at 06:36:30AM -0600:
> On Tue, 07 May 2019 23:32:20 +0200, Ingo Schwarze wrote:
 
>> Here is basic cleanup of the last major function in line.c, pshift().

> Looks good.  OK millert@

Thanks for checking, i will commit the line.c patch later today.

Here is a patch cleaning up all major issues in the file cvt.c.
They are all in the function cvt_text(), which is used for
setting up patterns and for searching.  One minor issue remains,
usage of the not-so-nice function is_ansi_middle(), but this doesn't
seem the right time yet to worry about that detail.

This patch deletes the abominable function put_wchar() which accepts
many invalid codepoints and happily produces invalid results,
replacing its only call with the standard function wctomb(3).  Note
that uncharacteristically, no error handling is needed after the
call to wctomb(3) because the wchar_t passed into it just came out
of either mbtowc(3) with a positive return value or out of towlower(3),
so we already know it is valid and not L'\0'.

The patch also gets rid of one automatic LWCHAR variable and one
call to the bad function step_char().

Note that memory allocation before calls to cvt_text() - in file
pattern.c, function compile_pattern() and in file search.c, function
search_range() - looks suspicious.  In theory, the UTF-8 representation
of an uppercase character might contain fewer bytes than the UTF-8
representation of the lowercase character towlower() converts it
to, in which case cvt_text() would write beyond the end of the dst
buffer.  I don't know whether uppercase characters with this property
actually exist in Unicode or might appear in the future - but no
doubt the Unicode character table is dark and full of terrors.  Then
again, that's a different (potential) issue which i won't try to
fix with the present patch.

I tested that searching for strings containing non-ASCII UTF-8
characters still works with -I, with -i, and without either, in all
cases with the correct case-related behaviour.

OK?
  Ingo


Index: charset.c
===
RCS file: /cvs/src/usr.bin/less/charset.c,v
retrieving revision 1.22
diff -u -p -r1.22 charset.c
--- charset.c   7 May 2019 14:26:38 -   1.22
+++ charset.c   8 May 2019 14:03:01 -
@@ -329,48 +329,6 @@ get_wchar(const char *p)
 }
 
 /*
- * Store a character into a UTF-8 string.
- */
-void
-put_wchar(char **pp, LWCHAR ch)
-{
-   if (!utf_mode || ch < 0x80) {
-   /* 0xxx */
-   *(*pp)++ = (char)ch;
-   } else if (ch < 0x800) {
-   /* 110x 10xx */
-   *(*pp)++ = (char)(0xC0 | ((ch >> 6) & 0x1F));
-   *(*pp)++ = (char)(0x80 | (ch & 0x3F));
-   } else if (ch < 0x1) {
-   /* 1110 10xx 10xx */
-   *(*pp)++ = (char)(0xE0 | ((ch >> 12) & 0x0F));
-   *(*pp)++ = (char)(0x80 | ((ch >> 6) & 0x3F));
-   *(*pp)++ = (char)(0x80 | (ch & 0x3F));
-   } else if (ch < 0x20) {
-   /* 0xxx 10xx 10xx 10xx */
-   *(*pp)++ = (char)(0xF0 | ((ch >> 18) & 0x07));
-   *(*pp)++ = (char)(0x80 | ((ch >> 12) & 0x3F));
-   *(*pp)++ = (char)(0x80 | ((ch >> 6) & 0x3F));
-   *(*pp)++ = (char)(0x80 | (ch & 0x3F));
-   } else if (ch < 0x400) {
-   /* 10xx 10xx 10xx 10xx 10xx */
-   *(*pp)++ = (char)(0xF0 | ((ch >> 24) & 0x03));
-   *(*pp)++ = (char)(0x80 | ((ch >> 18) & 0x3F));
-   *(*pp)++ = (char)(0x80 | ((ch >> 12) & 0x3F));
-   *(*pp)++ = (char)(0x80 | ((ch >> 6) & 0x3F));
-   *(*pp)++ = (char)(0x80 | (ch & 0x3F));
-   } else {
-   /* 110x 10xx 10xx 10xx 10xx 10xx */
-   *(*pp)++ = (char)(0xF0 | ((ch >> 30) & 0x01));
-   *(*pp)++ = (char)(0x80 | ((ch >> 24) & 0x3F));
-   *(*pp)++ = (char)(0x80 | ((ch >> 18) & 0x3F));
-   *(*pp)++ = (char)(0x80 | ((ch >> 12) & 0x3F));
-   *(*pp)++ = (char)(0x80 | ((ch >> 6) & 0x3F));
-   *(*pp)++ = (char)(0x80 | (ch & 0x3F));
-   }
-}
-
-/*
  * Step forward or backward one character in a string.
  */
 LWCHAR
Index: cvt.c
===
RCS file: /cvs/src/usr.bin/less/cvt.c,v
retrieving revision 1.9
diff -u -p -r1.9 cvt.c
--- cvt.c   26 Feb 2019 11:01:54 -  1.9
+++ cvt.c   8 May 2019 14:03:01 -
@@ -60,7 +60,8 @@ cvt_text(char *odst, char *osrc, int *ch
char *edst = odst;
char *src;
char *src_end;
-   LWCHAR ch;
+   wchar_t ch;
+   int len;
 
if (

Re: rtable_walk(9)

2019-07-11 Thread Ingo Schwarze
Hi Martin,

Martin Pieuchot wrote on Thu, Jul 11, 2019 at 05:18:41PM -0300:

> Index: rtable_walk.9
> ===
> RCS file: rtable_walk.9
> diff -N rtable_walk.9
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ rtable_walk.9 11 Jul 2019 20:16:23 -
> @@ -0,0 +1,68 @@
> +.\" $OpenBSD$
> +.\"
> +.\" Copyright (c) 2019 Martin Pieuchot 
> +.\"
> +.\" Permission to use, copy, modify, and distribute this software for any
> +.\" purpose with or without fee is hereby granted, provided that the above
> +.\" copyright notice and this permission notice appear in all copies.
> +.\"
> +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> +.\"
> +.Dd $Mdocdate$
> +.Dt RTABLE_WALK 9
> +.Os
> +.Sh NAME
> +.Nm rtable_walk
> +.Nd iterate over a routing table
> +.Sh SYNOPSIS
> +.In net/rtable.h
> +.Ft int
> +.Fn rtable_walk "unsigned int rtableid" "sa_family_t af" \
> +"struct rtentry **prt" "int (*func)(struct rtentry *, void *, unsigned int)" 
> \
> +"void *arg"

While this isn't incorrect, i suggest the more readable

.Ft int
.Fo rtable_walk
.Fa "unsigned int rtableid"
.Fa "sa_family_t af"
.Fa "struct rtentry **prt"
.Fa "int (*func)(struct rtentry *, void *, unsigned int)"
.Fa "void *arg"
.Fc

for functions with long arguments or with more than one or two arguments.

> +.Sh DESCRIPTION
> +The
> +.Fn rtable_walk
> +function iterates over the routing table
> +.Fa rtableid
> +and applies
> +.Fa func
> +to all entries of address family
> +.Fa af .
> +.Pp
> +The iteration is interrupted as soon as
> +.Fa func
> +returns a non-zero value.
> +If
> +.Fa prt
> +is non-null

Please consider the more usual form:

is not
.Dv NULL

These are not objections but merely suggestions.

Yours,
  Ingo

> when the iteration is interrupted, it is set to the current
> +routing entry.
> +In that case
> +.Fn rtfree
> +must be called on the routing entry pointed by
> +.Fa prt .
> +.Sh CONTEXT
> +.Fn rtable_walk
> +can be called during autoconf or from process context.
> +.Sh RETURN VALUES
> +.Fn rtable_walk
> +returns any non-zero value returned by
> +.Fa func .
> +It may also fail with:
> +.Pp
> +.Bl -tag -width Er -compact
> +.It Bq Er EAFNOSUPPORT
> +A routing table with ID of
> +.Fa rtableid
> +and address family of
> +.Fa af
> +doesn't exist.
> +.El
> +.Sh SEE ALSO
> +.Xr rtfree 9



Re: sysupgrade: select sets to install

2019-07-10 Thread Ingo Schwarze
Hi Theo,

Theo de Raadt wrote on Tue, Jul 09, 2019 at 09:23:25AM -0600:
> Klemens Nanni  wrote:

>> I think sysupgrade should, if at all, use the same semantics as the
>> installer.  That is, something like `sysugprade -S '-* b*'" to upgrade
>> nothing but kernels and base.
>> 
>> Such options offer great potential for users to shoot themselves in the
>> foot by doing partial upgrades;  I am not really sold on the idea, yet.

> From time to time I consider merging all the sets into baseXX.tgz.

That sounds reasonable to me.  Having separate sets was probably
useful in the 1980ies, but nowadays, it provides little benefit in
a general-purpose operating system, and getting rid of it would
reduce maintenance effort and recurring confusion when people shoot
themselves in the foot by not installings parts of the operating
system they actually want to use, then asking questions why their
system isn't working as expected.

By the way, on amd64, merging in game65 would make base65 1.3% larger,
man65 3.5%, and even comp65 only 35%.

It seems similar to avoiding flavours in ports if those flavours
provide little benefit: KISS.

Then again, merging the sets causes some work and churn and certainly
isn't an urgent task, but eventually and at a convenient time, i
expect that it should and will happen.

Yours,
  Ingo



Re: ed(1) man page doesn't mention use of single / and ?

2019-07-12 Thread Ingo Schwarze
Hi Jason,

Jason McIntyre wrote on Sun, Jul 07, 2019 at 08:56:43PM +0100:

> this is getting silly - we're mixing talking about how /re/ and ?re?
> work as addresses and as regular expressions in other places. dropping
> the trailing [/?] does not apply to regular expressions everywhere.

Actually, it does, though you are right in so far as the three
cases (addresses, global searches, and replacements) differ in
subtle ways.

> even if i have this wrong (to be fair, not that unlikely), we should
> just concentrate on one thing at a time.

We have now taken care of the case of addresses.

The case of replacements is already described correctly:

> regarding how dropping two delimiters works in an 's' command, although
> i'm not sure really of the reason, you can do:
>   1,$s/re
> which effectively drops /replacement/

Yes, this is described as follows:

   If one or two of the last delimiters is omitted, then the last line
   affected is printed as though the print suffix p were specified.

Admittedly, the grammar is awkward ("two of things is omitted"),
and it is not said explicitly that omitting both delimiters implies
deleting the re without a replacement - but one could argue that
is implicitly to be expected because omitting the middle delimiter
implies that the replacement can only be empty.  If you think this
should be improved, we can polish it later.


So what remains to be decsribed is the case of global searches (gGvV).

Here is what POSIX says about those:

  If the closing delimiter of an RE or of a replacement string (for
  example, '/' ) in a g, G, s, v, or V command would be the last
  character before a , that delimiter can be omitted, in
  which case the addressed line shall be written.

Our implementation appears to comply.

While documenting this, also fix a few glitches in the vicinity:

 * The wording "the current address is set to the last line affected
   by command-list" can easily be misunderstood.  The temporal sense
   of "last" is intended, but it can easily be read as the last (i.e.
   highest numbered) line in the file.  So let's better talk about
   the "last command executed", not about the "last line".
   You can easily see that this is correct with a command like

 g/.../+p\
 -p
 .

   which, for each matching line, first prints the following line,
   then the matching line itself, and leaves the matching line
   (printed by the last command) as the current line, not the
   following line, even though the following line was also "affected"
   and comes later in the file.

 * Avoid the confusing wording "a newline alone in command-list".
   The newline is not *in* the command list, but it comes after
   the command list, terminating it.

 * For G, explicitly say that an empty command list has a different
   effect compared to g: for g, it does the same as p, whereas for G,
   it does nothing at all.

There is nothing to do for v and V, which mostly point to g and G only.

OK for the patch below?

Yours,
  Ingo


Index: ed.1
===
RCS file: /cvs/src/bin/ed/ed.1,v
retrieving revision 1.72
diff -u -p -r1.72 ed.1
--- ed.112 Jul 2019 19:28:48 -  1.72
+++ ed.112 Jul 2019 22:17:47 -
@@ -377,7 +377,7 @@ The current address is set to the line c
 command-list is executed.
 At the end of the
 .Ic g
-command, the current address is set to the last line affected by command-list.
+command, the current address remains as set by the last command executed.
 If no lines were matched,
 the current line number remains unchanged.
 .Pp
@@ -392,22 +392,26 @@ Any commands are allowed, except for
 .Ic v ,
 and
 .Ic V .
-A newline alone in command-list is equivalent to a
+An empty
+.Ar command-list
+is equivalent to a
 .Ic p
 command.
+In this case, the trailing slash can be omitted.
 .Sm off
 .It (1,$) Ic G No / Ar re No /
 .Sm on
 Interactively edits the addressed lines matching a regular expression
 .Ar re .
+The trailing slash after
+.Ar re
+can be omitted.
 For each matching line, the line is printed, the current address is set,
 and the user is prompted to enter a
 .Ar command-list .
 At the end of the
 .Ic G
-command, the current address is set to the last line affected by
-.Pq the last
-command-list.
+command, the current address remains as set by the last command executed.
 If no lines were matched,
 the current line number remains unchanged.
 .Pp
@@ -415,11 +419,10 @@ The format of
 .Ar command-list
 is the same as that of the
 .Ic g
-command.
-A newline alone acts as a null command list.
+command, but an empty command list does nothing.
 A single
 .Sq &
-repeats the last non-null command list.
+repeats the last non-empty command list.
 .It Ic H
 Toggles the printing of error explanations.
 By default, explanations are not printed.



Re: ldomctl: improve usage

2019-06-30 Thread Ingo Schwarze
Hi Klemens,

Klemens Nanni wrote on Sun, Jun 30, 2019 at 01:04:17PM +0200:
> On Sat, Jun 22, 2019 at 11:20:10AM -0600, Theo de Raadt wrote:

>> The problem is when I'm on screens that don't have scroll-back, those 9
>> lines have scrolled other information off the top, and then I've had to
>> repeat the operations, or if not possible, been more frustrated.

> Should we change those programs and refer to manual page or show a
> synopsis one-liner instead?

Yes, i think programs emitting excessive amounts of text (when called
with a typo, and even more so when doing that always on startup,
like gdb(1)) should be fixed to shut the hell up.  This is Unix.

> I get the scroll-back argument, but I dare say it's a weak one given
> that this is something users can fix most of the time, e.g. by using
> tmux.

I find it offensive being told to use tmux(1) merely because some
programs like to chew my ears off.  I get it that many developers
find tmux(1) useful for many purposes, but the availability of
add-on tools allowing workarounds is no excuse for programs being
annoying in my face.

Also, with tmux(1), i might get scrollback.  But having the immediate
command history remain on screen is still much better than being
able to scroll around and having to look for the relevant lines
buried among long, irrelevant output.

Yes, long usage is mostly irrelvant: a typo doesn't mean i need
to re-read a long list of commands provided.

> I'd really like to have a valuable usage in ldomctl and haven't
> come up with a better and still consistent way of doing so.

There is consensus two parts of documentation are useful:

 * A short usage message shown after typos,
   typically one or two lines.
 * The manual page, containing complete and concise
   reference documentation.

Personally, i wouldn't object to exceptionally complicated programs
(like openrsync(1) or ldomctl(1)) providing command summaries
(typically one line per command) *when the user explicitly asks for
it* with an option like -h or --help.

But i do see Theo's argument that likely we don't really need
three different levels of detail, adding additional maintenance
effort and danger of getting outdated.

> ssh-keygen is another good example of long usage that is most probably
> not going to change back to a simpler one;  I just stumbled over it due
> to wrong usage and was therefore reminded of this mail thread.
> 
> What do other (ldomctl) users say?

This isn't really a question for ldomctl users at this point.

Usage should be very short as a matter consistency of the operating
system; in that sense, i do consider ssh-keygen(1) an offender.

*If* we decide that it would be valuable for selected programs to
provide an intermediate -h or --help, only then would it become
a question for ldomctl users whether ldomctl(8) is one of them.

But so far, i don't see consensus that we want to allow adding
such -h or --help options to base programs.  I'm mildly in favour
if in favour at all, and i have heard strong opposition, with
the reasonable argument that it adds maintenance effort for
little gain because it's not really that hard to find the
information quickly in a real manual page.

Yours,
  Ingo



Re: sleep(1): simplify argument parsing

2019-06-30 Thread Ingo Schwarze
Hi Scott,

Scott Cheloha wrote on Sun, Jun 30, 2019 at 06:49:28AM -0500:

> This is cleaner, shorter.
> 
>  - Remove the intermediate variables and just build the timespec
>directly.
> 
>  - Use for-loops to consolidate initialization/incrementation of cp
>into one spot in each loop.
> 
>  - Use one loop to parse fractions of a second: less duplicate code
>keeps the fraction path shorter.  Expand the comment to detail why
>this is fine.  We then explain the magic multiplier at the point
>of use instead of making you read the next loop to find out what
>happened in the prior loop.

I like that.
It is easier to understand and survived my code review and testing.

> ok?

OK schwarze@
  Ingo


> Index: sleep.c
> ===
> RCS file: /cvs/src/bin/sleep/sleep.c,v
> retrieving revision 1.27
> diff -u -p -r1.27 sleep.c
> --- sleep.c   10 Jan 2019 16:41:10 -  1.27
> +++ sleep.c   30 Jun 2019 11:22:58 -
> @@ -49,9 +49,8 @@ int
>  main(int argc, char *argv[])
>  {
>   int ch;
> - time_t secs = 0, t;
> + time_t t;
>   char *cp;
> - long nsecs = 0;
>   struct timespec rqtp;
>   int i;
>  
> @@ -71,40 +70,32 @@ main(int argc, char *argv[])
>   if (argc != 1)
>   usage();
>  
> - cp = *argv;
> - while ((*cp != '\0') && (*cp != '.')) {
> + timespecclear();
> +
> + /* Handle whole seconds. */
> + for (cp = *argv; *cp != '\0' && *cp != '.'; cp++) {
>   if (!isdigit((unsigned char)*cp))
>   errx(1, "seconds is invalid: %s", *argv);
> - t = (secs * 10) + (*cp++ - '0');
> - if (t / 10 != secs) /* oflow */
> + t = (rqtp.tv_sec * 10) + (*cp - '0');
> + if (t / 10 != rqtp.tv_sec)  /* overflow */
>   errx(1, "seconds is too large: %s", *argv);
> - secs = t;
> + rqtp.tv_sec = t;
>   }
>  
> - /* Handle fractions of a second */
> + /*
> +  * Handle fractions of a second.  The multiplier divides to zero
> +  * after nine digits so anything more precise than a nanosecond is
> +  * validated but not used.
> +  */
>   if (*cp == '.') {
> - cp++;
> - for (i = 1; i > 0; i /= 10) {
> - if (*cp == '\0')
> - break;
> + i = 1;
> + for (cp++; *cp != '\0'; cp++) {
>   if (!isdigit((unsigned char)*cp))
>   errx(1, "seconds is invalid: %s", *argv);
> - nsecs += (*cp++ - '0') * i;
> - }
> -
> - /*
> -  * We parse all the way down to nanoseconds
> -  * in the above for loop. Be pedantic about
> -  * checking the rest of the argument.
> -  */
> - while (*cp != '\0') {
> - if (!isdigit((unsigned char)*cp++))
> - errx(1, "seconds is invalid: %s", *argv);
> + rqtp.tv_nsec += (*cp - '0') * i;
> + i /= 10;
>   }
>   }
> -
> - rqtp.tv_sec = secs;
> - rqtp.tv_nsec = nsecs;
>  
>   if (timespecisset()) {
>   if (nanosleep(, NULL) == -1)



Re: new man page: rcsfile.5

2019-04-23 Thread Ingo Schwarze
Hi Fabio,

Fabio Scotoni wrote on Tue, Apr 23, 2019 at 01:45:22PM +0200:

> While familiarizing myself with OpenRCS,
> I noticed that there's no rcsfile(5).
> This may be handy for people whose RCS file got corrupted for whatever
> reasons.
> Other *NIXes that ship SCCS tend to have a sccsfile(4),
> so I figured it could be worthwhile to have something of that nature for
> OpenRCS.

I like this.

Especially when a file format is human-readable, non-trivial, and
easy to describe, describing it is worthwhile.

I think the source should go to /usr/src/usr.bin/rcs/rcs.5
and it should be installed to /usr/share/man/man5/rcs.5.

By the way, i recently had to hand-edit RCS files, for merging the
histories from multiple different historical CVS repositories.
This manual page is useful for such work.

> Feedback would be appreciated,

Some minor remarks inline.  They are not critical, though; polishing
of manual pages can also be done in-tree.

> as would be proofreading by someone who's
> done more work with RCS files than I have.

That wouldn't be me.  Either way, an additional OK to import would be
appreciated.

Yours,
  Ingo



At the top of the file, a Copyright notice and license is missing.
Please conisder using /usr/share/misc/license.template with the
comment markers changed to .\" .

> .Dd $Mdocdate$
> .Dt RCSFILE 5
> .Os
> .Sh NAME
> .Nm rcsfile
> .Nd format of an RCS file
> .Sh DESCRIPTION
> An RCS file is a text file.
> It consists of multiple sections, each separated by two empty lines.
> RCS files should not be edited by hand;
> the RCS programs should be used instead.

the
.Xr rcs 1
programs should be used instead.

> .Pp
> The RCS file consists of four sections:
> .Bl -tag -width "deltatexts" -compact
> .It admin
> Administrative data about the RCS file itself.
> .It deltas
> List of deltas, i.e. metadata about changes.
> .It desc
> A description of what the file.
> .It deltatexts
> List of the changes made by each delta.
> .El

Please avoid inventing names that don't actually appear in the file.
I suggest using:

The RCS file consists of four sections:
.Bl -enum -with 2n -compact
.It
Administrative data about the RCS file itself.
.It
List of deltas, i.e. metadata about changes.
.It
A description of what the file.
.It
List of the changes made by each delta.
.El

In that case, the structure of the rest of the text would remain
more or less unchanged.  Alternatively, you could shorten the
structure as follows:

programs should be used instead.
.Ss Administrative data
The first section contains administrative data about the RCS file itself.
...
.Ss List of deltas

And so on.

> .Pp
> The RCS file begins with admin data.
> Each entry in this section consists of either one or multiple values.
> Entries that take one value are specified on one line:
> The keyword and its corresponding value are separated by
> a tab character.
> Entries that take multiple values begin with a keyword on one line;
> the values are specified on subsequent lines, one per line and
> prefixed with a tab character.
> If the list is empty, the semicolon immediately follows the keyword.
> Entries are terminated with a semicolon.
> The entries are:
> .Bl -tag -width Ds
> .It Cm head Ar rev

I'd prefer "Ic head" (and similarly in the following) because "head"
is a stand-alone keyword, not an argument to something else.

> Highest revision number.
> .It Cm branch Ar rev
> Revision number that specifies the default branch (optional).
> .It Cm access
> List of users that are allowed to check in new revisions with
> .Xr ci 1 .
> This keyword is not followed by a tab;
> instead, the list of users is specified with one username per line.
> The list is terminated by a single semicolon.
> .It Cm symbols
> List of symbolic names used as aliases for specific revisions.
> Each entry consists of a symbolic name followed by a colon and
> the revision.
> .It Cm locks
> List of locked revisions.
> Each entry consists of a username followed by a colon and
> the locked revision.
> .It Cm strict
> Indicates that locks are strict;
> if missing, locks are not strict.
> This entry takes no value and immediately follows the list of locks
> without a newline.
> .It Cm comment @ Ns Ar leader Ns Cm @
> Contains the comment leader surrounded by at signs.
> .It Cm expand @ Ns Ar mode Ns Cm @
> Indicates the keyword substitution mode.
> See
> .Xr rcs 1
> for the possible keyword substitution modes.
> If this entry is not given in the admin section,
> .Ar kv
> must be assumed.
> .El
> .Pp
> The RCS file continues with a list of deltas.
> Deltas are separated by empty lines.
> The list begins with the head and ends with the initial revision.
> Each entry begins with the revision number, followed by a newline.
> After that, the structure follows the admin section.
> .Bl -tag -width Ds
> .It Cm date Ar .mm.dd.HH.MM.SS
> Indicates the date and time the revision was checked in.

Indicates the date and UTC time the ...

> The date consists of the year, the 

Re: new man page: rcsfile.5

2019-04-23 Thread Ingo Schwarze
Hi Fabio,

i applied a very small number of minor tweaks and plan to commit
the patch below tomorrow based on OK jmc@ deraadt@ millert@,
unless i hear objections.


Fabio Scotoni wrote on Tue, Apr 23, 2019 at 05:19:48PM +0200:

> I've adjusted the updated man page (inline) .Dt and .Nm to match this.
> 
> However, I would advise you not to.
> rcs(5) may be surprising for anyone coming from the other
> implementations of single-file version control systems:
> - GNU RCS ships rcsfile(5), and
> - all versions of SCCS ship sccsfile(4) (even Schily and BitKeeper).

Ouch.  I didn't intend to suggest changing the filename.
Sorry, that was a mistake on my part.
I only meant to suggest directories where the file should live.

I assume those who responded positively are OK with
  /usr/src/usr.bin/rcs/rcsfile.5
  /usr/share/man/man5/rcsfile.5
as well.

I think you are right that it is better to stick to the filename
used traditionally.

>> At the top of the file, a Copyright notice and license is missing.
>> Please conisder using /usr/share/misc/license.template with the
>> comment markers changed to .\" .

> I intentionally skipped on adding the copyright notice and license
> because I felt it would be presumptuous -- I wanted to keep open
> the option of a copyright assignment to an OpenBSD contributor open
> in case the OpenBSD project wished for it, just in case.

If you weren't a lawyer, i would feel tempted to suspect that you
misunderstood how Copyright works in general.  As things stand, i
may be misunderstanding what you intend to say here.  Either way,
i suspect you misunderstand how ownership of intellectual property
works in OpenBSD in particular.

By international law (Berne Convention), you cannot give away Copyright.
Copyright always resides with the natural person who is the original
creator of a Work.  It is inalienable.  The "Copyright" line is merely
a statement of fact.

Is it correct that you wrote all the text in the file?
If so, your name needs to be there.

According to the Berne Convention, even though you cannot give away
the moral rights that constitute the core of Copyright, you can give
away the economic rights arising from a Work.  Some publishers of
software (for example the FSF) ask for that.  But in OpenBSD, we
strongly encourage original authors to retain all economic rights
arising from their Works, and simply grant a standardized license
themselves, such that OpenBSD can distribute the Works.  The OpenBSD
project doesn't own any rights and isn't even a legal entity.

> Added in the updated man page inline,
> including the $OpenBSD$ keyword at the top.

Thank you for granting the license on your Work.

> Your suggestions and/or subsequent in-tree changes may warrant
> a copyright notice of your own.

Technically, if anybody changes a file in a way adding a non-trivial
amount of text - even one or two sentences will usally be sufficient,
arguably even a few words if they are important and difficult to
come up with -, they own Copyright on the changes and can make it
known that they do by adding a "Copyright" line of their own to the
file header.  That isn't required though, the Copyright exists
whether or not it is stated.

To keep file headers concise and because tracking all Copyright-worthy
changes in file headers is tedious, we tend to only add such additional
lines for massive contributions, or if an authors cares unusually
strongly about their contributions to a particular file.

Yours,
  Ingo


Index: Makefile
===
RCS file: /cvs/src/usr.bin/rcs/Makefile,v
retrieving revision 1.40
diff -u -p -r1.40 Makefile
--- Makefile15 Oct 2010 08:44:12 -  1.40
+++ Makefile23 Apr 2019 17:04:55 -
@@ -1,7 +1,9 @@
 #  $OpenBSD: Makefile,v 1.40 2010/10/15 08:44:12 tobias Exp $
 
 PROG=  rcs
-MAN=   ci.1 co.1 ident.1 merge.1 rcs.1 rcsclean.1 rcsdiff.1 rcsmerge.1 rlog.1
+MAN=   ci.1 co.1 ident.1 merge.1 \
+   rcs.1 rcsclean.1 rcsdiff.1 rcsmerge.1 rlog.1 \
+   rcsfile.5
 
 SRCS=  ci.c co.c ident.c merge.c rcsclean.c rcsdiff.c rcsmerge.c rcsparse.c \
rcsprog.c rlog.c rcsutil.c buf.c date.y diff.c diff3.c rcs.c rcsnum.c \
Index: rcsfile.5
===
RCS file: rcsfile.5
diff -N rcsfile.5
--- /dev/null   1 Jan 1970 00:00:00 -
+++ rcsfile.5   23 Apr 2019 17:04:55 -
@@ -0,0 +1,154 @@
+.\" $OpenBSD$
+.\"
+.\" Copyright (c) 2019 Fabio Scotoni 
+.\"
+.\" Permission to use, copy, modify, and distribute this software for any
+.\" purpose with or without fee is hereby granted, provided that the above
+.\" copyright notice and this permission notice appear in all copies.
+.\"
+.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+.\" ANY SPECIAL, DIRECT, INDIRECT, OR 

Re: new man page: rcsfile.5

2019-04-23 Thread Ingo Schwarze
Hi Fabio,

Fabio Scotoni wrote on Tue, Apr 23, 2019 at 07:46:31PM +0200:
> On 4/23/19 7:15 PM, Ingo Schwarze wrote:

>> Is it correct that you wrote all the text in the file?
>> If so, your name needs to be there.

> It is indeed correct that I wrote all the text in the file (except, by
> definition, your suggestions).

Thanks for confirming.  (Misunderstandings about authorship would be bad.)

>> Index: rcsfile.5
[...]
>> +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>> +.\"

> Nitpick: The other man pages in the usr.bin/rcs/ tree do not have a
> trailing .\" -- except for rcs(1).
> You may wish to make this consistent this one way or another,
> likely in a separate commit.

I put the .\" here because i guess that's the more usual way in
OpenBSD, but it's not important enough for changing it in existing
pages.

[...]
>> +.Bl -tag -width Ds
>> +.It Ic date Ar  . Ar mm . Ar dd . Ar HH . Ar MM . Ar SS
 
> Are you sure you want to do this without Pf/Ns?
> You may have inadvertently introduced spaces here and

Oops, indeed, thanks for catching that.
Fixed in my tree:

.It Ic date Ar  . Ns Ar mm . Ns Ar dd . Ns Ar HH . Ns Ar MM . Ns Ar SS

> Maybe this should be [YY]YY because two-digit years do exist.

I think the

  Oo Ar YY Oc Ns Ar YY .

which that would require would be even more complicated than what
is there now.  The  can be considered a placeholder, and the
text already makes it clear that the actual value can be shorter.

Yours,
  Ingo



Re: new man page: rcsfile.5

2019-04-24 Thread Ingo Schwarze
Hi Fabio,

Fabio Scotoni wrote on Wed, Apr 24, 2019 at 06:14:07AM +0200:
> On 4/23/19 9:30 PM, Ingo Schwarze wrote:
>> Fabio Scotoni wrote:
>>> Ingo Schwarze wrote:

>>>> +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>>>> +.\"

>>> Nitpick: The other man pages in the usr.bin/rcs/ tree do not have a
>>> trailing .\" -- except for rcs(1).
>>> You may wish to make this consistent this one way or another,
>>> likely in a separate commit.

>> I put the .\" here because i guess that's the more usual way in
>> OpenBSD, but it's not important enough for changing it in existing
>> pages.

> Duly noted.
> I wonder if it might be worth noting that somewhere, though I'm not sure
> where.

It is neither possible to document such minor details (which i would
not even call a style convention) nor desirable to attempt to,
lest we drown in unimportant rules.

I do document some recommendations that are below the threshold of
warranting inclusion into manual pages on

  https://mandoc.bsd.lv/mdoc/

but this particular detail is way below the threshold for inclusion
even in that extended documentation.  If people do it differently,
it doesn't really matter.

> With these changes, rcsfile.5 looks okay to me now.

This for writing and checking the page.
I just committed it.

Yours,
  Ingo



Re: [diff] events.html - add 2 more BSDCan 2019 videos

2019-06-29 Thread Ingo Schwarze
Hi Ross,

Ross L Richardson wrote on Sat, Jun 22, 2019 at 02:17:26PM +1000:

> Probably in acceptable form :-)

Committed, thanks.
  Ingo

> Index: events.html
> ===
> RCS file: /cvs/www/events.html,v
> retrieving revision 1.1174
> diff -u -p -r1.1174 events.html
> --- events.html   18 Jun 2019 15:01:52 -  1.1174
> +++ events.html   22 Jun 2019 04:09:17 -
> @@ -68,7 +68,9 @@ May 15-18, 2019, Ottawa, Canada.
>  (http://bhyvecon.org/;>bhyvecon Ottawa 2019)
>  Bob Beck -
>  https://github.com/bob-beck/libtls/blob/master/TUTORIAL.md;>libtls 
> for beginners conference tutorial
> -Theo Buehler - Design and verification of the TLS 1.3 handshake state 
> machine in LibreSSL (slides)
> +Theo Buehler - Design and verification of the TLS 1.3 handshake state
> +  machine in LibreSSL (slides,
> +  https://www.youtube.com/watch?v=MCVIBwGOwNY;>video)
>  Florian Obser - https://man.openbsd.org/unwind.8;>unwind(8)
>a privilege-separated, validating DNS recursive nameserver for every laptop
>(slides,
> @@ -82,7 +84,8 @@ May 15-18, 2019, Ottawa, Canada.
>(slides,
>https://www.youtube.com/watch?v=s6rAXaHylFM;>video)
>  Bob Beck
> -Unveil in OpenBSD
> +  Unveil in OpenBSD
> +  (https://www.youtube.com/watch?v=gvmGfpMgny4;>video)
>  Jan Klemkow - Network booted OpenBSD Workstations
>(slides,
>https://www.youtube.com/watch?v=kFqHXfWEB4o;>video)



Re: [diff] www/libressl/papers.html - add video link

2019-06-29 Thread Ingo Schwarze
Hi Ross,

Ross L Richardson wrote on Sat, Jun 22, 2019 at 02:24:46PM +1000:

> The following corresponds with the events.html patch...

Almost...

Committed with the correct YT ID MCVIBwGOwNY instead of gvmGfpMgny4
which is beck@'s unveil(2) talk.

Thanks anyway,
  Ingo


> Index: papers.html
> ===
> RCS file: /cvs/www/libressl/papers.html,v
> retrieving revision 1.16
> diff -u -p -r1.16 papers.html
> --- papers.html   13 Jun 2019 07:34:36 -  1.16
> +++ papers.html   22 Jun 2019 04:20:59 -
> @@ -19,7 +19,7 @@ Presentations and Papers
>  
>  https://www.openbsd.org/papers/bsdcan2019-tls13.pdf;>Design and 
> verification of the TLS 1.3 handshake state machine in LibreSSL
>  by Theo Buehler
> -
> + ( href="https://www.youtube.com/watch?v=gvmGfpMgny4;>video)
>  
>  
>  Presentation: FSec 2015



Re: add pkg-config files for readline, editline, ncurses

2019-08-02 Thread Ingo Schwarze
Hi Stephen,

Stephen Gregoratto wrote on Fri, Jul 26, 2019 at 01:23:21AM +1000:

> mrsh[1], a cross-platform shell, can use readline in interactive mode.
> It's configure script detects the presence of readline using
> pkg-config(1). Thus, this patch adds a pkg-config file for our readline.

It is not obvious to me whether or not that is desirable.  It depends
on which version of readline mrsh requires.  If mrsh builds and is
usable with our base system readline (which is quite old), what you
propose may help.  On the other hand, if it requires a newer readline
or doesn't work well with the base readline, adding a library dependency
on ports readline might be better.

> I just copied over the generate_pkgconfig.sh script/make rules found in
> other libraries and edited to fit.
> 
> I also added pkg-config files for libedit and libcurses,

I think that ought to be a completely separate change because a program
either uses base libedit or base readline or ports readline but never
more than one of these.  I know relatively little about pkg-config(1);
the reason i'm speaking up here is that nobody else spoke up so far and
that i occasionally worked on libedit in the past.  Now, as far as i
understand (take this with a grain of salt), the effect of adding a
pkg-config file for libedit would possibly be that additional ports
might pick up our base libedit that now think that no libedit is
available and consequently compile without it.  That might improve
some ports - and/or break others.  So i suspect that adding libedit.pc
to base would probably require a bulk build before commit to make sure
nothing breaks.  Also - even though it seems less likely that anything
in the base system looks at *.pc files - it would probably require a
complete make build & release of base and xenocara.

For adding readline.pc, the same caveats apply, only even more so.
I suspect that the risk of breaking ports by adding readline.pc to
base is larger than for editline because probaly more ports use readline
than editline and those that do are probably more likely to use
pkg-config(1) and those that do both are probably more likely to be
unhappy with our base readline, whereas ports looking for editline
probably have a better chance to be happy with our base editline.

As far as i understand, adding editline.pc does not necessarily
require also adding ncurses.pc, nor does it require Requires:
lines.  So far, we don't have a single non-empty Requires: line
in base even though libssl depends on libcrypto.  Again, the effect
of adding ncurses.pc to base might be that some ports suddenly
pick up our base curses, for good or ill.  If there is a benefit
to that, i suspect it might make sense to test that change
independently from libedit.pc.  I don't think we want a bulk build
with three combined changes and lots of fallout, because then we
would be forced to figure out which fallout was caused by which
part of the changes, which sounds like gratuitious additional pain.

> since libedit requires linking with libcurses.  I started getting
> build errors before I realised that there are a couple editline
> libraries floating around, each differing slightly with each other:

*If* we want editline.pc, then i expect we probably just want one,
for our own editline.  Additional ones would only make sense to me
in the context of a port of one of the other editline implementations.
But i don't really see the point of such a port.  If some port does
not work with our editline but does work with a different editline,
we should probably either fix our editline or patch the port to
work with our editline, depending on what exactly to port is doing.

> Some Linux systems (e.g. Arch) generate pkg-config files for the
> different "types" of ncurses:
> 
>   /usr/lib/pkgconfig/ncurses++.pc -> ncurses++w.pc
>   /usr/lib/pkgconfig/ncurses++w.pc
>   /usr/lib/pkgconfig/ncurses.pc -> ncursesw.pc
>   /usr/lib/pkgconfig/ncursesw.pc
>   /usr/lib/pkgconfig/tic.pc -> ncursesw.pc
>   /usr/lib/pkgconfig/tinfo.pc -> ncursesw.pc
> 
> Not sure if this is something I should do here.

I doubt it.  In any case, the base system should contain one *curses*.pc
at most, i think, because we only have one curses in base.  Any other
*curses*.pc would be a matter of ports - and like for editline, i doubt
that having a port of a different version of curses makes sense.

That said, even though you are changing the base system, i suspect
discussing this kind of patches may be more fruitful on ports@
because that's where all the ports developers are that will be
affected by such changes.

I suggest that you decide which of the three changes you care about
most, then send a patch to ports@ with only that one change,
explaining what exactly it is supposed to improve (and how you
tested that it indeed does), and what exactly you already tested
in addition to that (for example, building which ports that use
pkg-config(1) and the library in question), asking for feedback.
The feedback you 

less(1): utility function to step one multibyte character to the left

2019-08-30 Thread Ingo Schwarze
Hi,

the file less/line.c currently contains three copies of code to
step one multibyte character to the left in a char * buffer, and
cleaning up the file less/cmdbuf.c will require similar functionality
in the future.  So let's introduce a new function for that purpose.
Keep it as similar as possible to the ANSI C function mbtowc(3).
Then again, there is no need to handle the case of the input char *
being NULL; for resetting, people should use mbtowc(3) itself, not
the new function.

The logic in less/cmdbuf.c is so contorted that cleaning that up
will still be challenging, but this at least provides one piece to
help with the puzzle.

Behaviour may change in LC_CTYPE=C mode if linebuf[curr-1] contains
an explicit '\0' byte.  I think in that case, jumping out of the
backc() function, just like it is already done for UTF-8 mode, i.e.
not trying to back up across the NUL byte, is more robust behaviour.
Either way, linebuf[] is not supposed to contain NUL bytes except
in -r mode, and -r mode is almost unusable in the first place.
No other functional change is intended.

Minus 16 lines of code, by the way...

OK?
  Ingo


Index: charset.c
===
RCS file: /cvs/src/usr.bin/less/charset.c,v
retrieving revision 1.24
diff -u -p -r1.24 charset.c
--- charset.c   15 May 2019 19:36:20 -  1.24
+++ charset.c   30 Aug 2019 13:02:01 -
@@ -146,6 +146,27 @@ init_charset(void)
 }
 
 /*
+ * Like mbtowc(3), except that it converts the multibyte character
+ * preceding ps rather than the one starting at ps.
+ */
+int
+mbtowc_left(wchar_t *pwc, const char *ps, size_t psz)
+{
+   size_t sz = 0;
+   int len;
+
+   do {
+   if (++sz > psz)
+   return -1;
+   } while (utf_mode && IS_UTF8_TRAIL(ps[-sz]));
+   if ((len = mbtowc(pwc, ps - sz, sz)) == -1) {
+   (void)mbtowc(NULL, NULL, 0);
+   return -1;
+   }
+   return len == sz || (len == 0 && sz == 1) ? len : -1;
+}
+
+/*
  * Is a given character a "control" character?
  */
 static int
Index: funcs.h
===
RCS file: /cvs/src/usr.bin/less/funcs.h,v
retrieving revision 1.23
diff -u -p -r1.23 funcs.h
--- funcs.h 15 May 2019 19:36:20 -  1.23
+++ funcs.h 30 Aug 2019 13:02:02 -
@@ -55,6 +55,7 @@ void ch_set_eof(void);
 void ch_init(int, int);
 void ch_close(void);
 int ch_getflags(void);
+int mbtowc_left(wchar_t *, const char *, size_t);
 void init_charset(void);
 char *prchar(LWCHAR);
 char *prutfchar(LWCHAR);
Index: line.c
===
RCS file: /cvs/src/usr.bin/less/line.c,v
retrieving revision 1.31
diff -u -p -r1.31 line.c
--- line.c  15 May 2019 19:06:01 -  1.31
+++ line.c  30 Aug 2019 13:02:02 -
@@ -437,44 +437,20 @@ backc(void)
wchar_t  ch, prev_ch;
int  i, len, width;
 
-   i = curr - 1;
-   if (utf_mode) {
-   while (i >= lmargin && IS_UTF8_TRAIL(linebuf[i]))
-   i--;
-   }
-   if (i < lmargin)
+   if ((len = mbtowc_left(, linebuf + curr, curr)) <= 0)
return (0);
-   if (utf_mode) {
-   len = mbtowc(, linebuf + i, curr - i);
-   if (len == -1 || i + len < curr) {
-   (void)mbtowc(NULL, NULL, MB_CUR_MAX);
-   return (0);
-   }
-   } else
-   ch = linebuf[i];
+   curr -= len;
 
/* This assumes that there is no '\b' in linebuf.  */
-   while (curr > lmargin && column > lmargin &&
-   (!(attr[curr - 1] & (AT_ANSI|AT_BINARY {
-   curr = i--;
-   if (utf_mode) {
-   while (i >= lmargin && IS_UTF8_TRAIL(linebuf[i]))
-   i--;
-   }
-   if (i < lmargin)
+   while (curr >= lmargin && column > lmargin &&
+   !(attr[curr] & (AT_ANSI|AT_BINARY))) {
+   if ((len = mbtowc_left(_ch, linebuf + curr, curr)) <= 0)
prev_ch = L'\0';
-   else if (utf_mode) {
-   len = mbtowc(_ch, linebuf + i, curr - i);
-   if (len == -1 || i + len < curr) {
-   (void)mbtowc(NULL, NULL, MB_CUR_MAX);
-   prev_ch = L'\0';
-   }
-   } else
-   prev_ch = linebuf[i];
width = pwidth(ch, attr[curr], prev_ch);
column -= width;
if (width > 0)
return (1);
+   curr -= len;
if (prev_ch == L'\0')
return (0);
ch = prev_ch;
@@ -554,21 +530,8 @@ store_char(LWCHAR ch, char a, char *rep,
}
if (w == -1) {
wchar_t prev_ch;
-
-   if 

Re: delete ligature support for Arabic "la" from the less(1) command line

2019-09-01 Thread Ingo Schwarze
Hello Mohammadreza,

Mohammadreza Abdollahzadeh wrote on Sun, Sep 01, 2019 at 09:40:16AM +0430:

> Persian is my native language and I think that the major problem that
> all RTL (Right-To-Left) languages like Persian and Arabic currentlly suffer
> from is the lack of BiDi (Bidirectionality) support in console and terminal
> environment like xterm(1). KDE konsole(1) support bidi and that's why it
> show ligatures correctly.
> I think any attempt to fix such problems must first start with adding bidi
> support to xterm and other terminal environment.

Thank you for your feedback!

If i understand correctly, xterm(1) does indeed have that problem.
I prepared a test file that contains, in this order,

 - some Latin characters
 - the Arabic word "la" ("no"), i.e. first LAM, then ALEF
 - some more Latin characters
 - the Arabic word "al" ("the"), i.e. first ALEF, then LAM
 - some final Latin characters

And indeed, xterm(1) does not respect the writing direction of the
individual words.  When cat(1)'ing the file to stdout, both xterm(1)
and konsole(1) show all the words from left to right, but *inside*
each word, konsole(1) uses the correct writing direction: right to
left for Arabic and left to right for Latin.  For example, in the
Arabic word "al", konsole(1) correctly shows the ALEF right of the
LAM, whereas xterm(1) wrongly shows the ALEF left of the LAM.

I'm not entirely sure this has much to do with ligatures, though.
What matters for building ligatures is only the logical ordering,
the ordering in *time* so to speak, i.e. what comes before and what
comes after.  LAM before ALEF has to become the ligature glyph "al",
whereas ALEF before LAM remains two glyphs.  Technically, the
question of ordering in space, whether glyphs are painted onto the
screen right to left or left to right, only comes into play after
characters have already been combined into glyphs.

Actually, now that you bring up the topic, i see another situation
where less(1) causes an issue.  Let's use konsole(1) and not xterm(1)
such that we get the correct writing direction, and let's put the
word "al" onto the screen.  No ligature here, so that part of the
topic is suspended for a moment.  Now let's slowly scroll right in
one-column steps.  All is fine as long as the word "al" is completely
visible on screen.  But when the final letter LAM of "al" is in the
last (leftmost) column of the screen and you scroll right one more
column, something weird happens, even in konsole(1).  You would
expect the final letter LAM to scroll off screen first and the initial
letter ALEF to remain on the screen for a little longer.  Instead,
less(1) incorrectly thinks the *initial* letter of the word scrolls
off screen first, and it tells xterm(1) to display the ALEF in the
leftmost column of the screen while the LAM just went off-screen.
That looks weird because there is no word in that text beginning
with ALEF.

This means that being able to properly view Arabic or Farsi text
with the default OpenBSD terminal emulator and parser would require

 1. bidi support in xterm(1)
to render Farsi words with the correct writing direction
 2. ligature support in xterm(1)
to correctly connect letters
 3. bidi support in less(1)
to correctly scroll parts of words on and off screen, horizontally
 4. ligature support in less(1)
for correct columnation

As far as i understand, you are saying that the extremely fragmentary
support for item 4 which we happen to have right now is not really
useful without items 1-3, and even when using konsole(1), which does
have items 1 and 2, implementing item 3 before item 4 would make
sense because item 3 is more importrant.

So my understanding is that you are not objecting to the patch because
the fragmentary support for item 4 is practically useless in isolation.


The following is not related to this patch, but i think it makes
sense to mention it here: regarding the future, i think items 1 and
3 are much easier to support than items 2 and 4 because bidi support,
if i understand correctly, only needs one bit of information per
character because it only needs to know whether the character is
part of a right to left or left to right script, so the complexity
on the libc level, where we want complexity least of all places,
is comparable to other boolean character properties like those
listed in the iswalnum(3) manual page.  Realistically, though,
bidi support would still be a large project, and i don't think it
makes sense to tackle it any time soon.

Ligature support feels much worse than bidi support because the
mapping required is not merely character -> boolean but (character +
character) -> character, which is more complicated than even the
(character + character) -> -1/0/+1 mapping required for collation
support - and we decided that we don't want collation support in
libc because it would cause excessive complexity.  Admittedly,
collations are strongly locale-dependent, while i'm not sure ligatures
are 

Re: libedit: Does not run input command in vi mode

2019-09-03 Thread Ingo Schwarze
Hello Masato-san,

Masato Asou wrote on Tue, Sep 03, 2019 at 02:39:11PM +0900:

> Does not run input command by vi editor with vi mode.
> 
> I do the following:
> 
> 1. set vi mode.
>$ echo "bind -v" > ~/.editrc
> 
> 2. launch /usr/bin/ftp command.
>$ ftp
> 
> 3. launch vi editor with ESC + v.
>ftp> ESC + v
> 
> 4. input "help" in vi editor.
>i + help + ESC + :wq
> 
> 5. then 'help' command does not run.
> 
> I fix this problem with following patch. This fix is come from NetBSD
> lib/libedit/vi.c 1.46 and 1.47.
>
> ok?

Indeed, limiting the length of the command created with vi(1) to
the length of the string the command line contained when vi(1) was
started is wrong.

Both code inspection and testing confirm that this is a bug
and that your patch is the correct fix.

IK schwarze@; please commit.

Thank you,
  Ingo


> Index: vi.c
> ===
> RCS file: /cvs/src/lib/libedit/vi.c,v
> retrieving revision 1.27
> diff -u -p -r1.27 vi.c
> --- vi.c  3 Sep 2019 02:28:25 -   1.27
> +++ vi.c  3 Sep 2019 05:34:31 -
> @@ -1058,12 +1058,12 @@ vi_histedit(EditLine *el, wint_t c __att
>   while (waitpid(pid, , 0) != pid)
>   continue;
>   lseek(fd, (off_t)0, SEEK_SET);
> - st = read(fd, cp, TMP_BUFSIZ);
> + st = read(fd, cp, TMP_BUFSIZ - 1);
>   if (st > 0) {
> - len = (size_t)(el->el_line.lastchar -
> - el->el_line.buffer);
> + cp[st] = '\0';
> + len = (size_t)(el->el_line.limit - el->el_line.buffer);
>   len = mbstowcs(el->el_line.buffer, cp, len);
> - if (len > 0 && el->el_line.buffer[len -1] == '\n')
> + if (len > 0 && el->el_line.buffer[len - 1] == '\n')
>   --len;
>   }
>   else



Re: typo in man page of sysupgrade option -s

2019-08-23 Thread Ingo Schwarze
Hi Florian,

Florian Obser wrote on Fri, Aug 23, 2019 at 02:43:22PM +0200:

> This is not a typo as such, it describes the default. It is a bit
> awkward since it uses the same text for -r and -s. I think we can just
> remove it from -s:

The following seems more straightforward.

I don't feel strongly about it, though; you patch is OK with me, too.

Yours,
  Ingo


Index: sysupgrade.8
===
RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.8,v
retrieving revision 1.8
diff -u -r1.8 sysupgrade.8
--- sysupgrade.89 May 2019 21:09:37 -   1.8
+++ sysupgrade.823 Aug 2019 12:54:38 -
@@ -62,16 +62,10 @@
 but do not reboot.
 .It Fl r
 Upgrade to the next release.
-The default is to find out if the system is running a release or a snapshot.
-In case of release
-.Nm
-downloads the next release.
+This is the default if the system is currently running a release.
 .It Fl s
 Upgrade to a snapshot.
-The default is to find out if the system is running a release or a snapshot.
-In case of release
-.Nm
-downloads the next release.
+This is the default if the system is currently running a snapshot.
 .El
 .Sh FILES
 .Bl -tag -width "/home/_sysupgrade" -compact



less(1): replace the last step_char(-1)

2019-09-02 Thread Ingo Schwarze
Hi,

The command line handling code in less/cmdbuf.c is very complicated.
>From the top level function cmd_char(), the stack can go down nine levels
before finally reaching the bottom level function cmd_step_common().
One of the functions traversed during that descent is the recursive
function cmd_repaint(), and the call tree has many different branches.
UTF-8 handling occurs both in the top level function and in the
bottom level function, and also at some places in between.

So when cleaning up the UTF-8 handling in here, i'm trying to proceed
very slowly and in minimal steps.  Given that i have little hope of
finding a system for testing each and every possible code path, my
plan is to make sure that individual changes do not change behaviour
locally (or only in intended ways).

So, lets get to the first, small step.  Right now, only one place
remains where the function step_char() is called with dir = -1,
i.e. moving the cursor leftward: from cmd_step_left() near the
bottom of the call stack.  That function is used when moving the
cursor left by one character (with or without erase), by one word,
or all the way back to the prompt.  It is also used to delete
leftover characters after repainting the line when the new text is
shorter than the old text.  And it is also used when scrolling
forward.  Given the complexity of the code, there may be even more
situations.

The obvious replacement for step_char(-1) is the new function
mbtowc_left() that i recently introduced.  Using that allows dropping
the "dir" argument from step_char().  Of course, the plan still is
to ultimately get rid fo step_char() altogether.

Let's make sure the new behaviour makes sense:

 * When at the beginning of the buffer, do not move, and use L'\0'
   as a replacement for the character that isn't there.
   Same behaviour as before.
 * When there is no valid UTF-8 character to the left (return value -1),
   back up one byte and use L'\0'.
   This _is_ a change of behaviour.  Current behaviour is to back up
   across all UTF-8 continuation bytes (no matter how many) until
   finding a byte that is not a continuation byte, then take the
   length given in that byte at face value (even if invalid) and
   construct a codepoint using that number of bytes (even if that
   implies ignoring additional garbage bytes in between, and even
   it it means extending the character that is supposedly to the
   left of the cursor to bytes that are to the right of the current
   cursor position).
   I believe making multibyte steps only for valid characters,
   and stepping byte-by-byte across invalid bytes, is safer than 
   the reckless jumping done right now.
   Then again, the practical difference may be small (if any)
   because i'm not quite sure how to get invalid bytes into the
   command buffer in the first place.
 * When there is a NUL byte to the left (return value 0),
   back up by this one byte, and use it as L'\0'.
   Same behaviour as before.

Tested by typing UTF-8 into the command buffer, moving the cursor
left and right across it (with and without deleting), inserting
UTF-8 before it, and scrolling the UTF-8 off the screen both to the
right and to the left.

OK?
  Ingo


Index: charset.c
===
RCS file: /cvs/src/usr.bin/less/charset.c,v
retrieving revision 1.26
diff -u -p -r1.26 charset.c
--- charset.c   2 Sep 2019 14:07:45 -   1.26
+++ charset.c   2 Sep 2019 15:00:50 -
@@ -353,7 +353,7 @@ get_wchar(const char *p)
  * Step forward or backward one character in a string.
  */
 LWCHAR
-step_char(char **pp, int dir, char *limit)
+step_char(char **pp, char *limit)
 {
LWCHAR ch;
int len;
@@ -361,11 +361,8 @@ step_char(char **pp, int dir, char *limi
 
if (!utf_mode) {
/* It's easy if chars are one byte. */
-   if (dir > 0)
-   ch = (LWCHAR) ((p < limit) ? *p++ : 0);
-   else
-   ch = (LWCHAR) ((p > limit) ? *--p : 0);
-   } else if (dir > 0) {
+   ch = (LWCHAR) ((p < limit) ? *p++ : 0);
+   } else {
len = utf_len(*p);
if (p + len > limit) {
ch = 0;
@@ -374,13 +371,6 @@ step_char(char **pp, int dir, char *limi
ch = get_wchar(p);
p += len;
}
-   } else {
-   while (p > limit && IS_UTF8_TRAIL(p[-1]))
-   p--;
-   if (p > limit)
-   ch = get_wchar(--p);
-   else
-   ch = 0;
}
*pp = p;
return (ch);
Index: cmdbuf.c
===
RCS file: /cvs/src/usr.bin/less/cmdbuf.c,v
retrieving revision 1.20
diff -u -p -r1.20 cmdbuf.c
--- cmdbuf.c2 Sep 2019 14:07:45 -   1.20
+++ cmdbuf.c2 Sep 2019 15:00:51 -
@@ -141,7 +141,7 @@ len_cmdbuf(void)
   

delete ligature support for Arabic "la" from the less(1) command line

2019-08-31 Thread Ingo Schwarze
Hi,

i have to admit that i am neither able to speak nor to write nor
to understand the Arabic language nor the Arabic script, but here
is my current, probably incomplete understanding of what our less(1)
program is trying to do with Arabic ligatures.

If somebody is reading this who is able to read and write Arabic
or an Indian language heavily using ligatures, feedback is highly
welcome.

Arabic is a cursive script, which means that when writing Arabic,
characters do not map 1:1 to glyphs.  Instead, there are rules about
how adjacent characters attach to each other, forming ligatures.

As an extremely simple example, consider the Arabic adverb "la",
which means the same as the English adverb "no".  It consists of
the two letters U+0644 LAM and U+0627 ALEF, the LAM appearing before
(i.e. to the right of) the ALEF.  However, you do not write both
letters separately.  Instead, the ALEF leans forward (to the left)
and attaches to the LAM, forming the glyph U+FEFB, ARABIC LIGATURE
LAM WITH ALEF ISOLATED FORM.  When displayed in a fixed width font,
that ligature only occupies a single display column just like any
other Arabic or Latin glyph.  The LAM WITH ALEF glyph is not a
double-width glyph like Japanese or Chinese characters typically
are.

So, when this happens, you have four bytes of UTF-8 forming two
Unicode characters, and *together*, these two characters occupy
only one single display column.

Note that in the default configuration, our xterm(1) is not able
to display Arabic characters at all.  But even when you run
  xterm -fa arabic
or
  xterm -fa fixed
which uses FreeType support instead of the default X toolkit font
support, such that xterm(1) does become able to display single
Arabic characters, it still displays the word "la" incorrectly,
failing to generate the required ligature and instead displaying
the two characters LAM and ALEF separately.

So i installed konsole-18.12.0p1 for testing (which pulls in
ridiculous amounts of dependencies, dozens of them, but oh well,
i guess support for advanced Unicode features isn't trivial).
The konsole(1) program does display the word "la" correctly, as a
ligature.

Now, running less(1) inside konsole(1), i found that columnation
is already subtly broken.  As long as the "la" ligature is visible
on screen, all is fine.  Now scroll to the right until the "la"
appears in the first screen column.  Then scroll one more column
to the right by pressing "1 RIGHTARROW".  Now you see *half* the
ligature, i.e. an isolated ALEF, in the first column of the screen,
even though the Arabic word does not contain an isolated ALEF.
Besides, we just attempted to scroll the "la" off screen, so the
ALEF now appears in the column one to the right of where the "la"
should actually be, and all the rest of the line is shifted one
column to the right, too, so columnation is now off by one.
Scrolling back left, columnation recovers to correct display.

I strongly suspect i broke that during my previous UTF-8 cleanup
work on less(1).

However, LAM WITH ALEF is literally the only ligature that less(1)
supports, together with three variations (with MADDA above, with
HAMZA above, and with HAMZA below).  But there are hundreds of
ligatures in Arabic, see

  https://www.unicode.org/charts/PDF/UFB50.pdf
  https://www.unicode.org/charts/PDF/UFE70.pdf

I have no idea how many of those work in konsole(1) - but i'm sure
none of those, except the four LAM WITH ALEF discussed here, work
with less(1), so i think support for LAM WITH ALEF provided no value
in the first place.  The way it is implemented, with an ad-hoc table
inside less(1) of character combinations that form ligatures, is
just wrong and not sustainable by any stretch of the imagination,
i think.

On top of that, how characters combine in Arabic is strongly context
dependent; even the syllable "la" forms a different ligature depending
on whether it is isolated or at the end of a longer word, and none
of the context dependencies are implemented in less(1) anyway.

And finally, people say the situation in many Indian languages is
even more dire than in Arabic, so what our less(1) tries to do is
almost certainly completely useless for those languages, even if
we would expand the ad-hoc table.

So, i propose to delete support for combining characters into
ligatures from our less(1): at this point, it is only used for
typing at the less prompt anyway (and not for the file displayed),
only for Arabic, and only for the single ligature "la".  If we ever
want better ligature support in the future, i think we would have
to make a fresh start anyway - and i think there are many other
things to do before that.

Note that this only removes support for combining characters into
ligatures that can also stand on their own; support for purely
combining accents like U+300 COMBINING GRAVE ACCENT and U+3099
COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK remains intact.

OK?
  Ingo


Index: charset.c

Re: remove chroot(2) from spamd(8)

2019-07-31 Thread Ingo Schwarze
Hi,

Theo de Raadt wrote on Wed, Jul 31, 2019 at 09:48:57AM -0600:
> Ricardo Mestre  wrote:

>> By now we are already confident that pledge(2) "just works(tm)"
>> and that it can be used to effectively remove filesystem access.
>> 
>> That being said, in spamd(8) when I pledge(2)d it the main priv process got
>> "stdio inet" which means there's no fs access at all so calling
>> chroot(2)/chdir(2) here doesn't make sense anymore. Just remove them.

> I'm fine with removing chroot+chdir where unveil or pledge provide the
> same (or better) isolation.
> 
> But let's do it program by program, carefully considering the case.
> 
> I need to mention this change makes such programs unsafe to operate
> in portable environments which lack pledge/unveil.  An extra safety
> belt is being removed we don't need, but other people need that
> safety belt a lot.  That doesn't change my view we should make
> the programs simpler without chroot; the impact is for downstreams
> to handle.

Maybe put a comment above the pledge() call?
Something like:

/*
 * When porting this program to a platform lacking pledge(2),
 * don't forget to at least properly chroot(2) the child instead.
 */

The spamd(8) utility does look like a plausible candidate for porting,
but i'm not sure how familiar the porters are with OpenBSD ways,
so the removal of chroot(2) might easily cause misunderstandings.
Consider e.g. projects like

  http://greyd.org/
  https://aur.tuna.tsinghua.edu.cn/packages/greyd/

which appear to have very little connection to OpenBSD.

Yours,
   Ingo



Re: remove chroot(2) from spamd(8)

2019-07-31 Thread Ingo Schwarze
Hi,

Ricardo Mestre wrote on Wed, Jul 31, 2019 at 07:41:08PM +0100:
> On 11:22 Wed 31 Jul , Theo de Raadt wrote:
>> Ingo Schwarze  wrote:

>>> /*
>>>  * When porting this program to a platform lacking pledge(2),
>>>  * don't forget to at least properly chroot(2) the child instead.
>>>  */

>> But I really don't see the benefit of such lines of text.
>> Outsiders won't read the text, and it wastes our screen real estate.

> Regarding downstream I'm "not too worried" since if you follow us either
> you are already using OpenBSD or you should apply best policies already,

Fair enough.

I'm hardly the right person to provide an OK here, but i have
no objection.

Yours,
  Ingo



Re: printf(1) man page has a small omission

2019-08-01 Thread Ingo Schwarze
Hi Andras,

please do not cross-post on OpenBSD lists, choose whatever list fits
best.  I trimmed bugs@ for this followup.

On Mon, Jun 3, 2019 at 2:12 PM Andras Farkas wrote:

> https://man.openbsd.org/man1/printf.1
> The section on the b format (%b) neglects to mention that for that
> format, it's \0num rather than \num
> Because of the way OpenBSD's printf is made, both \0num and \num work
> for %b, but, \0num is more correct and portable when using printf(1)'s
> %b format.
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html

I agree that deserves describing, just in case some system ever
implements the standard and only the standard.


Andras Farkas wrote on Mon, Jul 29, 2019 at 11:54:18PM -0400:

> I have a diff attached which fixes the man page.

I don't quite agree with your patch.  In practice, both \0num
and \num work; i inspected the code of FreeBSD and NetBSD which
both appear to support \num, too, even though they don't document
it, and i tested on Linux, and on Solaris 9, 10, and 11, and both
forms work everywhere:

   $ printf '%b\n' '\0176x'
  ~x
   $ printf '%b\n' '\176x'
  ~x

So here is an alternative patch.

OK?
  Ingo


Index: printf.1
===
RCS file: /cvs/src/usr.bin/printf/printf.1,v
retrieving revision 1.32
diff -u -r1.32 printf.1
--- printf.12 Jun 2019 06:16:37 -   1.32
+++ printf.11 Aug 2019 21:53:11 -
@@ -319,6 +319,15 @@
 Characters from the string
 .Ar argument
 are printed with backslash-escape sequences expanded.
+In the
+.Ar argument ,
+ASCII characters can be octally encoded either as
+.Cm \e0 Ns Ar num
+or as
+.Cm \e Ns Ar num
+like in the
+.Ar format
+string.
 If the
 .Ar argument
 contains the special escape sequence
@@ -373,7 +382,17 @@
 .Ev LC_ALL Ns =C
 were set.
 .Pp
-The escape sequences \ee and \e' are extensions to that specification.
+The escape sequences
+.Cm \ee
+and
+.Cm \e' ,
+as well as omitting the leading digit
+.Cm 0
+from
+.Cm \e0 Ns Ar num
+octal escape sequences in
+.Cm %b
+arguments, are extensions to that specification.
 .Sh HISTORY
 The
 .Nm



Re: Remove some sigvec Xr's

2019-07-28 Thread Ingo Schwarze
Hi Philip,

Philip Guenther wrote on Thu, Jul 25, 2019 at 07:21:48PM -0900:

> Hmm: sh(1) and ksh(1) have *nothing* from sections 2 or 3 in their SEE
> ALSO.  That doesn't seem like a wrong choice,

Indeed.  Jason generally discourages linking from section 1 to
sections 2 and 3, arguing that neither interactive use of a program
nor shell programming ought to require worrying about C interfaces.
There are exceptions, for example to avoid duplicating lists, e.g.
pointing from sysctl(8) to sysctl(2) or from kill(1) to sigaction(2).

> albeit inconsistent with csh(1).

My feeling is csh(1) is the odd one out here.  The amount of
cross references from csh(1) to section 2 looks excessive.

For the umask builtin command, i kind of see the point of pointing
to section 2 - but to chmod(2) where the bits are explained, umask(2)
being less relevant, and it would also seem more useful close to the
explanation of the umask builtin rather than below SEE ALSO.

> Part of me feels like _if_ they're going to mention umask(2),
> setrlimit(2), and sigaction(2), then they should mention chdir(2),
> as the other classic "must be in the shell" syscall.  

The question to ask is: is it likely that somebody who decided to
look up csh(1) will also benefit from reading chdir(2), in the
same context?  I don't quite see yet why that might help.

I'd rather move into the opposite direction, see below.

By the way, i suspect that the reason why the .Xr to ksh(1) is
missing is that ksh(1) and sh(1) historically used to be the
same manual page, until Jason rewrote a clean version of sh(1)
from scratch a few years ago.

OK?
  Ingo


Index: csh.1
===
RCS file: /cvs/src/bin/csh/csh.1,v
retrieving revision 1.82
diff -u -r1.82 csh.1
--- csh.1   26 Jul 2019 12:08:18 -  1.82
+++ csh.1   28 Jul 2019 23:07:13 -
@@ -2219,6 +2219,9 @@
 the mask are 002 giving all access to the group and read and execute
 access to others or 022 giving all access except write access for
 users in the group or others.
+The
+.Xr chmod 2
+manual page provides the complete list of bits that can be set.
 .Pp
 .It Ic unalias Ar pattern
 All aliases whose names match the specified pattern are discarded.
@@ -2702,16 +2705,8 @@
 .Dq ~name
 .El
 .Sh SEE ALSO
+.Xr ksh 1 ,
 .Xr sh 1 ,
-.Xr access 2 ,
-.Xr execve 2 ,
-.Xr fork 2 ,
-.Xr pipe 2 ,
-.Xr setrlimit 2 ,
-.Xr sigaction 2 ,
-.Xr umask 2 ,
-.Xr wait 2 ,
-.Xr killpg 3 ,
 .Xr tty 4 ,
 .Xr environ 7 ,
 .Xr script 7



Re: SIGSEGV in libedit

2019-08-05 Thread Ingo Schwarze
Hello Yasuoka-san,

YASUOKA Masahiko wrote on Mon, Aug 05, 2019 at 04:20:41PM +0900:

> This is the diff which fixes the problem by replacing malloc by calloc.
> 
> Initialize the line buffer by zero when allocation.  This fixes the
> problem a crash happens after the window size change.

Actually, only the two "b[i] = calloc(..." changes are needed.
The two "b = calloc(..." are pointless because b[] is already
explicitly and completely initialized four lines below.

Then again, even though i doubt that we still regard NetBSD as
"upstream" for libedit, we do still compare code occasionally, so
avoiding gratuitious differences still makes sense.

In this case, the "b = calloc(..." is harmless precisely because the
initializarion has absolutely no effect, being completely overwritten
four lines below.  Performance is irrelevant here because this function
is called quite rarely and certainly not in a loop.

So the diff is OK schwarze@ as is.

Thanks for tracking this down and finding the best fix.

Yours,
  Ingo


> Index: lib/libedit/terminal.c
> ===
> RCS file: /cvs/src/lib/libedit/terminal.c,v
> retrieving revision 1.18
> diff -u -p -r1.18 terminal.c
> --- lib/libedit/terminal.c12 Apr 2017 18:24:37 -  1.18
> +++ lib/libedit/terminal.c5 Aug 2019 07:13:08 -
> @@ -413,11 +413,11 @@ terminal_alloc_display(EditLine *el)
>   wchar_t **b;
>   coord_t *c = >el_terminal.t_size;
>  
> - b = reallocarray(NULL, c->v + 1, sizeof(*b));
> + b = calloc(c->v + 1, sizeof(*b));
>   if (b == NULL)
>   goto done;
>   for (i = 0; i < c->v; i++) {
> - b[i] = reallocarray(NULL, c->h + 1, sizeof(**b));
> + b[i] = calloc(c->h + 1, sizeof(**b));
>   if (b[i] == NULL) {
>   while (--i >= 0)
>   free(b[i]);
> @@ -428,11 +428,11 @@ terminal_alloc_display(EditLine *el)
>   b[c->v] = NULL;
>   el->el_display = b;
>  
> - b = reallocarray(NULL, c->v + 1, sizeof(*b));
> + b = calloc(c->v + 1, sizeof(*b));
>   if (b == NULL)
>   goto done;
>   for (i = 0; i < c->v; i++) {
> - b[i] = reallocarray(NULL, c->h + 1, sizeof(**b));
> + b[i] = calloc(c->h + 1, sizeof(**b));
>   if (b[i] == NULL) {
>   while (--i >= 0)
>   free(b[i]);



Re: SIGSEGV in libedit

2019-08-01 Thread Ingo Schwarze
Hello Yasuoka-san,

YASUOKA Masahiko wrote on Thu, Aug 01, 2019 at 08:42:35PM +0900:

> I noticed the upstream NetBSD recently replaced almost all malloc(3)s
> by calloc(3) in libedit.
> 
> https://github.com/NetBSD/src/commit/b91b3c48e0edb116bd797586430cb426b575d717
> 
> This also fixes the problem.  I'll create a diff which does the same
> thing.

Might i suggest that you first send a diff changing only the one place
required to fix the bug asou@ reported?  That can easily be reviewed,
and i expect we can get it in quickly, and it allows a useful commit
message explaining why exactly it is needed.


Regarding all the other places in the library:  it is true that in
many situations, zeroing buffers when allocating them provides
additional safety.  But there may be exceptions; in some cases,
it may hide or rarely even cause bugs.

In any case, i hate it when people go "i have no idea whether it
is needed and what it does, but let's just change this globally
anyway".  Even if it's about something as seemingly unconspicious
as zeroing buffers.  The NetBSD commit message provides no indication
that the changed code was inspected for consequences of the change,
and as far as i know Christos (admittedly, i only met him two or
three times and didn't talk all *that* much to him) i guess it might
indeed be his style to commit changes without actually inspecting
the code.

So i think for committing to OpenBSD, each function changed needs
to be inspected and it needs to be confirmed that zeroing each
buffer in question does not cause new problems or hide other
bugs.  That may be somewhat painful work, libedit code is not very
audit friendly, but i don't think cutting corners is a good idea.

In general, code quality in libedit is somewhat below OpenBSD quality
standards, so the time spent auditing it is certainly not wasted
but spent at a place needing it.  Also, the code quality implies
that zeroing some additional buffers likely improves security at
least at some of the places - if it is checked properly.

Yours,
  Ingo



Re: less -S does not "truncate" lines

2019-07-20 Thread Ingo Schwarze
Hi George,

George Brown wrote on Sat, Jul 20, 2019 at 01:29:37PM +0100:

> When viewing a file with "less -S" that has lines longer than $COLUMNS
> said lines are simply not wrapped. The contents of said lines is still
> available, one simply needs to scroll horizontally. I would have
> expected the contents displayed within less to be as if I had run
> something like "cut -c 1-$COLUMNS file".
> 
> I came across this as I was wondering why "less -FS" was still
> paginating on inputs with long lines that would have otherwise fit on
> the screen.
> 
> I have observed this behaviour on the following platforms.
> 
> * OpenBSD 6.5
> * OpenBSD 6.5 - current (snapshot 2019-07-19)
> * FreeBSD 12
> * Ubunutu 19.04
> * macOS 10.14.5
> 
> Given the consistent behaviour I would think it is my expectation that
> is wrong rather than the observed result.
> 
> It could be argued this is a "visual truncation" from the current view.
> However in my mind the word "truncation" has a very specific meaning of
> dropping a remainder such that it is no longer available at all. As such
> I would suggest an alteration to the description in the man page.
> 
> For reference the GNU less man page is the only one that doesn't use the
> word "truncate", instead describing it as "chopped rather than folded".

Both of the words "truncate" and "chop" seem more likely to be
understood as permanant deletion rather than temporary hiding, so
both are best avoided.  I committed the following patch which is
shorter and simpler, and which also describes the option in a
positive rather than a negative way.

Yours,
  Ingo


CVSROOT:/cvs
Module name:src
Changes by: schwa...@cvs.openbsd.org2019/07/20 07:21:10

Modified files:
usr.bin/less   : less.1 

Log message:
Correct misleading description of -S;
problem reported by George Brown <321 dot george at gmail dot com> on tech@.

Index: less.1
===
RCS file: /cvs/src/usr.bin/less/less.1,v
retrieving revision 1.54
retrieving revision 1.55
diff -u -r1.54 -r1.55
--- less.1  26 May 2019 01:16:09 -  1.54
+++ less.1  20 Jul 2019 13:21:09 -  1.55
@@ -499,10 +499,8 @@
 Thus, various display problems may result,
 such as long lines being split in the wrong place.
 .It Fl S | -chop-long-lines
-Causes lines longer than the screen width to be
-chopped (truncated) rather than wrapped.
-That is, the portion of a long line that does not fit in
-the screen width is not shown.
+Display only the beginning of lines that exceed the screen width.
+The remainder can be shown by scrolling horizontally.
 The default is to wrap long lines; that is, display the remainder
 on the next line.
 .It Fl s | -squeeze-blank-lines



Re: [patch] Re-add 'proc' to vi(1) when running in secure mode.

2019-07-21 Thread Ingo Schwarze
Hi,

Bryan Steele wrote on Fri, Jul 19, 2019 at 06:14:56PM -0400:
> On Sat, Jul 20, 2019 at 12:03:03AM +0200, Jesper Wallin wrote:
>> On Fri, Jul 19, 2019 at 05:14:03PM -0400, Bryan Steele wrote:

>>> I suspect that in secure/-S mode, the :pre[serve] should either be
>>> disabled, or modified to stop calling sendmail. The mail it is sending
>>> is purely advisory, and should be easy to disable. See common/recover.c.

>> Oh, you're right.  A bit ironic that I didn't notice the exec violation
>> due to the fork being permitted now.  Thanks for pointing this out!
>> Scrap my old patch, here's a better proposal:

> ok brynet@

No, that patch is not OK either.

It breaks :pre in -S mode because rcv_mailfile() is a blantant misnomer
that actually has two purposes:
1. Create the mail file to indicate that there is something recoverable
2. and then actually send the mail.

While step 2 must be skipped, step 1 for creating the recover.* file
is still needed, or "vi -r" will later complain "vi: No files to recover"
even though a vi.* file exists in /tmp/vi.recover/.  Yes, i freely
admit that the design is horribly contorted.

So here is a better patch avoiding that problem.
It is also safer because it is easier to see that fork(2)
can no longer be reached.  Otherwise, you would need to understand
that even though rcv_init() calls rcv_mailfile() and rcv_mailfile()
calls rcv_email(), fork(2) cannot be reached in that way because
isync is 0 in rcv_mailfile().

Yours,
  Ingo

P.S.
See below the patch for my analysis of the code, which you may or may
not find helpful while verifying the patch.


Index: common/recover.c
===
RCS file: /cvs/src/usr.bin/vi/common/recover.c,v
retrieving revision 1.29
diff -u -p -r1.29 recover.c
--- common/recover.c10 Nov 2017 18:25:48 -  1.29
+++ common/recover.c21 Jul 2019 15:45:59 -
@@ -821,6 +821,15 @@ rcv_email(SCR *sp, int fd)
struct stat sb;
pid_t pid;
 
+   /*
+* In secure mode, our pledge(2) includes neither "proc"
+* nor "exec".  So simply skip sending the mail.
+* Later vi -r still works because rcv_mailfile()
+* already did all the necessary setup.
+*/
+   if (O_ISSET(sp, O_SECURE))
+   return;
+
if (_PATH_SENDMAIL[0] != '/' || stat(_PATH_SENDMAIL, ) == -1)
msgq_str(sp, M_SYSERR,
_PATH_SENDMAIL, "not sending email: %s");



There are three call paths to rcv_mailfile() and rcv_email():

 * rcv_init() -> rcv_mailfile()
   purpose: set up .rcv_fd and .rcv_mpath
 for locking purposes and in case of later dying from a signal
   calls to rcv_init() are always safe because isync == 0,
 i.e. rcv_email() is never called 

 * rcv_sync() -> rcv_email() directly
   purpose: emergency saving while dying from SIGTERM;
 RCV_EMAIL is not set in any other situation
   so the direct call to rcv_email() must be neutered
   note: RCV_SNAPSHOT is NOT set in this case

 * rcv_sync() -> rcv_mailfile() -> rcv_email()
   purpose: manual "pre" command
 RCV_SNAPSHOT is not set in any other situation
   note: RCV_EMAIL is NOT set in this case
   here, the rcv_email() inside rcv_mailfile() must be neutered



Re: make msgsnd(2) more posix

2019-07-18 Thread Ingo Schwarze
Hi Moritz,

Moritz Buhl wrote on Thu, Jul 18, 2019 at 01:38:49PM +0200:
> bluhm@ wrote:

>> Moritz, can you create a man page ERRORS diff?

> Is the attached diff ok?

Yes, committed with some additional tweaks.

Thank you,
  Ingo


CVSROOT:/cvs
Module name:src
Changes by: schwa...@cvs.openbsd.org2019/07/18 06:57:12

Modified files:
lib/libc/sys   : msgsnd.2 

Log message:
State that mtype < 1 causes EINVAL as required by POSIX
and as implemented by OpenBSD since sysv_msg.c rev. 1.35.
Diff from Moritz Buhl  requested by bluhm@.

While here, add STANDARDS, improve HISTORY,
and use the customary .Fa for struct fields rather than .Va.


Index: msgsnd.2
===
RCS file: /cvs/src/lib/libc/sys/msgsnd.2,v
retrieving revision 1.19
diff -u -r1.19 msgsnd.2
--- msgsnd.210 Dec 2014 19:19:00 -  1.19
+++ msgsnd.218 Jul 2019 12:57:04 -
@@ -53,10 +53,10 @@
 char mtext[1]; /* body of message */
 .Ed
 .Pp
-.Va mtype
+.Fa mtype
 is an integer greater than 0 that can be used for selecting messages (see
 .Xr msgrcv 2 ) ;
-.Va mtext
+.Fa mtext
 is an array of
 .Fa msgsz
 bytes, with a size between 0 and that of the system limit
@@ -65,7 +65,7 @@
 If the number of bytes already on the message queue plus
 .Fa msgsz
 is bigger than the maximum number of bytes on the message queue
-.Po Va msg_qbytes ,
+.Po Fa msg_qbytes ,
  see
 .Xr msgctl 2
 .Pc ,
@@ -105,16 +105,16 @@
 queue is updated in the following way:
 .Bl -bullet
 .It
-.Va msg_cbytes
+.Fa msg_cbytes
 is incremented by the size of the message.
 .It
-.Va msg_qnum
+.Fa msg_qnum
 is incremented by 1.
 .It
-.Va msg_lspid
+.Fa msg_lspid
 is set to the pid of the calling process.
 .It
-.Va msg_stime
+.Fa msg_stime
 is set to the current time.
 .El
 .Sh RETURN VALUES
@@ -127,9 +127,12 @@
 .Fa msqid
 is not a valid message queue identifier.
 .Pp
+.Fa mtype
+is less than 1.
+.Pp
 .Fa msgsz
 is greater than
-.Va msg_qbytes .
+.Fa msg_qbytes .
 .It Bq Er EACCES
 The calling process does not have write access to the message queue.
 .It Bq Er EAGAIN
@@ -153,6 +156,13 @@
 .Xr msgctl 2 ,
 .Xr msgget 2 ,
 .Xr msgrcv 2
+.Sh STANDARDS
+The
+.Fn msgsnd
+function conforms to the X/Open System Interfaces option of
+.St -p1003.1-2008 .
 .Sh HISTORY
-Message queues appeared in
-.At V.1 .
+Message queues first appeared in
+.At V.1
+and have been available since
+.Nx 1.0 .



Re: Diff to stop using reserved words for smtpd.conf(5) examples

2019-07-23 Thread Ingo Schwarze
Hi Gilles,

Gilles Chehade wrote on Tue, Jul 23, 2019 at 08:27:06AM +0200:
> On Mon, Jul 22, 2019 at 05:05:01PM -0400, Kurt Mosiejczuk wrote:

>> This is a diff for that changes the example smtpd.conf and smtpd.conf.5
>> so that it no longer uses words that are parts of the configuration
>> syntax as labels for actions.  A large chunk of my delay to a release
>> with the new smtpd.conf configuration syntax was my confusion with the
>> examples. Even writing this diff I realized that the quotes were only 
>> necessary in the examples because configuration grammar was being 
>> reused as labels.
 
> I'll let ingo and jmc comment on this, I think the rationale for
> using the quotes and the reserved words was precisely to show we
> should use the quotes at these places

I do agree with showing quotes around action names and other
user-selected strings and identifiers.  I think it helps understanding
by making keywords easier to distinguish from other, replaceable
strings.  Even if in some cases, the syntax does not require the
quotes.

> and that reserved words in quotes were acceptable.

The smtpd.conf(5) manual explains that explicitly, and i think it
may make sense to show an example for it, even though i'm unsure
whether the practice of re-using reserved words as user-defined
identifiers should be recommended.

But Theo's argument that it can be confusing to explain too many
different aspects in the same example makes sense to me, in particular
when the explanation of the example doesn't even mention all the
aspects being demonstrated.  Also, Kurt's experience confirms that
this kind of confusion can indeed arise.

> To be honest, ever since we made these examples, no one has ever
> mailed me with a broken config due to a reserved keyword whereas
> it used to be the case every now and then before.

So, don't remove the quotes.

Avoiding clashes that *require* quoting in most of the examples
as Kurt suggests might make sense.  I don't feel strongly about
it either way.

Also, giving one example (maybe at the end?) that explicitly shows
and mentions the possibility of using a quoted reserved word as
a user-selected string might make sense, too.  Or maybe not, if
we do not want to encourage it?  Do we?  Not sure, no strong
feelings about that point either.

I think there is a lot of room here for you to decide what you
consider good and robust style for writing this particular configuration
file.

Yours,
  Ingo



Re: Diff to stop using reserved words for smtpd.conf(5) examples

2019-07-24 Thread Ingo Schwarze
Hi Kurt,

Kurt Mosiejczuk wrote on Wed, Jul 24, 2019 at 09:28:26AM -0400:
> On Wed, Jul 24, 2019 at 02:17:47PM +0200, Klemens Nanni wrote:
>> On Wed, Jul 24, 2019 at 01:32:52PM +0200, Gilles Chehade wrote:

>>> Well I think we should remove the reserved keywords as suggested by Kurt
>>> but keep the quotes in all examples to make it very explicit that we are
>>> expecting a string literal at this point. 

>> I concur.

> Here's a diff that does that.

Looks good to me.
  Ingo


> Index: etc/mail/smtpd.conf
> ===
> RCS file: /cvs/src/etc/mail/smtpd.conf,v
> retrieving revision 1.11
> diff -u -p -r1.11 smtpd.conf
> --- etc/mail/smtpd.conf   4 Jun 2018 21:10:58 -   1.11
> +++ etc/mail/smtpd.conf   24 Jul 2019 13:20:51 -
> @@ -9,11 +9,11 @@ table aliases file:/etc/mail/aliases
>  #
>  listen on lo0
>  
> -action "local" mbox alias 
> -action "relay" relay
> +action "local_mail" mbox alias 
> +action "outbound" relay
>  
>  # Uncomment the following to accept external mail for domain "example.org"
>  #
> -# match from any for domain "example.org" action "local"
> -match for local action "local"
> -match for any action "relay"
> +# match from any for domain "example.org" action "local_mail"
> +match for local action "local_mail"
> +match for any action "outbound"
> Index: usr.sbin/smtpd/smtpd.conf.5
> ===
> RCS file: /cvs/src/usr.sbin/smtpd/smtpd.conf.5,v
> retrieving revision 1.210
> diff -u -p -r1.210 smtpd.conf.5
> --- usr.sbin/smtpd/smtpd.conf.5   22 Dec 2018 08:54:02 -  1.210
> +++ usr.sbin/smtpd/smtpd.conf.5   24 Jul 2019 13:20:51 -
> @@ -871,12 +871,12 @@ table secrets file:/etc/mail/secrets
>  
>  listen on lo0
>  
> -action "local" mbox alias 
> -action "relay" relay host smtp+tls://b...@smtp.example.com \e
> +action "local_mail" mbox alias 
> +action "outbound" relay host smtp+tls://b...@smtp.example.com \e
>   auth 
>  
> -match for local action "local"
> -match for any action "relay"
> +match for local action "local_mail"
> +match for any action "outbound"
>  .Ed
>  .Pp
>  In this second example,
> @@ -908,12 +908,12 @@ listen on egress tls pki mail.example.co
>  
>  action mda_with_aliases mda "/path/to/mda \-f \-" alias 
>  action mda_without_aliases mda "/path/to/mda \-f \-"
> -action "relay" relay
> +action "outbound" relay
>  
>  match for local action mda_with_aliases
>  match from any for domain example.com action mda_without_aliases
> -match for any action "relay"
> -match auth from any for any action "relay"
> +match for any action "outbound"
> +match auth from any for any action "outbound"
>  .Ed
>  .Pp
>  For sites that wish to sign messages using DKIM, the
> @@ -929,13 +929,13 @@ table aliases file:/etc/mail/aliases
>  listen on lo0
>  listen on lo0 port 10028 tag DKIM
>  
> -action "mbox" mbox alias 
> -action "relay" relay
> -action relay_dkim relay host smtp://127.0.0.1:10027
> -
> -match for local action "mbox"
> -match tag DKIM for any action "relay"
> -match for any action relay_dkim
> +action "local_mail" mbox alias 
> +action "outbound" relay
> +action "relay_dkim" relay host smtp://127.0.0.1:10027
> +
> +match for local action "local_mail"
> +match tag DKIM for any action "outbound"
> +match for any action "relay_dkim"
>  .Ed
>  .Pp
>  Sites that accept non-local messages may be able to cut down on the
> @@ -952,14 +952,14 @@ table other-relays file:/etc/mail/other-
>  listen on lo0
>  listen on egress
>  
> -action "mbox" mbox alias 
> -action "relay" relay
> +action "local_mail" mbox alias 
> +action "outbound" relay
>  
> -match for local action "mbox"
> -match for any action "relay"
> +match for local action "local_mail"
> +match for any action "outbound"
>  match !from src  mail\-from "@example.com" for any \e
>reject
> -match from any for domain example.com action "mbox"
> +match from any for domain example.com action "local_mail"
>  .Ed
>  .Sh SEE ALSO
>  .Xr mailer.conf 5 ,



Re: ctags.1: Xr vgrind

2019-07-15 Thread Ingo Schwarze
Hi,

Jason McIntyre wrote on Mon, Jul 15, 2019 at 06:43:09PM +0100:
> On Mon, Jul 15, 2019 at 04:56:29PM +0200, Klemens Nanni wrote:

>> This tool comes from the textproc/vgrind port so base does not have it,
>> is that the reason we do not reference it?
>> 
>> I think it is still useful, so here's the Xr addition.

> we pretty much try to keep xrs only to base pages in base pages.
> yes, there are exceptions, but they are few and far between.
> i'm not keen to start adding xrs to pages other people don;t have.

I tend to agree with Jason; besides, it would also cause dead
hyperlinks on man.openbsd.org.

The following may be useful, though.  This kind of reference occurs
rarely enough that i'm not sure whether we have a standard wording
for it.

Yours,
  Ingo


Index: ctags.1
===
RCS file: /cvs/src/usr.bin/ctags/ctags.1,v
retrieving revision 1.33
diff -u -p -r1.33 ctags.1
--- ctags.1 31 Dec 2015 14:01:26 -  1.33
+++ ctags.1 15 Jul 2019 19:33:36 -
@@ -171,7 +171,8 @@ default output tags file
 Duplicate objects are not considered errors.
 .Sh SEE ALSO
 .Xr mg 1 ,
-.Xr vi 1
+.Xr vi 1 ,
+and the textproc/vgrind port
 .Sh STANDARDS
 The
 .Nm



Re: ctags.1: Xr vgrind

2019-07-15 Thread Ingo Schwarze
Hi Jason,

Jason McIntyre wrote on Mon, Jul 15, 2019 at 09:04:23PM +0100:
> On Mon, Jul 15, 2019 at 09:35:26PM +0200, Ingo Schwarze wrote:
>> Jason McIntyre wrote on Mon, Jul 15, 2019 at 06:43:09PM +0100:
>>> On Mon, Jul 15, 2019 at 04:56:29PM +0200, Klemens Nanni wrote:

>>>> This tool comes from the textproc/vgrind port so base does not
>>>> have it, is that the reason we do not reference it?
>>>> I think it is still useful, so here's the Xr addition.

>>> we pretty much try to keep xrs only to base pages in base pages.
>>> yes, there are exceptions, but they are few and far between.
>>> i'm not keen to start adding xrs to pages other people don;t have.

>> I tend to agree with Jason; besides, it would also cause dead
>> hyperlinks on man.openbsd.org.
>> The following may be useful, though.  This kind of reference occurs
>> rarely enough that i'm not sure whether we have a standard wording
>> for it.

> i don;t like it, to be honest.  can't you just reference the port
> in the main body, if it's really needed?
> 
>   ...expected by the textproc/vgrind port is produced...

Indeed, that's even better because it is really only relevant in
the context of the ctags -v option.

OK?
  Ingo


Index: ctags.1
===
RCS file: /cvs/src/usr.bin/ctags/ctags.1,v
retrieving revision 1.33
diff -u -r1.33 ctags.1
--- ctags.1 31 Dec 2015 14:01:26 -  1.33
+++ ctags.1 15 Jul 2019 20:39:45 -
@@ -92,8 +92,9 @@
 references to them are regenerated, keeping only the other values in the
 file.
 .It Fl v
-An index of the form expected by vgrind
-is produced on the standard output.
+An index of the form expected by the
+.Pa textproc/vgrind
+port is produced on the standard output.
 This listing contains the object name, file name, and page number (assuming
 64 line pages).
 Since the output will be sorted into lexicographic order,



Re: [patch] Re-add 'proc' to vi(1) when running in secure mode.

2019-07-22 Thread Ingo Schwarze
Hi,

Bryan Steele wrote on Sun, Jul 21, 2019 at 01:53:49PM -0400:
> On Sat, Jul 20, 2019 at 12:03:03AM +0200, Jesper Wallin wrote:

>> Oh, you're right.  A bit ironic that I didn't notice the exec violation
>> due to the fork being permitted now.  Thanks for pointing this out!
>> Scrap my old patch, here's a better proposal:

[ ... and later, on a different patch ]

> I somehow missed that. Indeed, moving the check into rcv_email before
> fork makes more sense.  Sorry for jumping the gun.

Some observations, in decreasing order of importance:

 1. Jesper, thanks for finding and reporting the regression.
It is fixed now.

 2. Bryan, thanks for suggesting the direction how to fix it.

 3. Jesper, including a patch according to the best of your
understanding is always welcome.  Even if it turns out to be a
bad patch, because often even a bad patch helps to understand
what the OP thinks the problem might be: the saying is, bad
patches trigger good ones.

 4. Merely because something different eventually gets committed
does not necessarily mean the original idea was bad.  For many
problems, there is more than one solution, and it is often a
matter of judgement or taste which one is better.
Jesper, in this case, re-adding "proc exec" would indeed have
been an alternative solution - even though preventing "exec"
is kind of the whole point of the -S option.

 5. It might seem obvious, but after writing a patch and before
sending it out, testing it makes sense.  The easy part is testing
that it actually solves the original problem.  The slightly
harder part is testing that is also solves similar problems (in
this case, when catching SIGTERM).  The hardest part is testing
that it causes no regressions.

 6. The history of this thread (the original commit of rev. 1.29,
the initial suggestion to re-add "proc" only, the suggestion
to skip rcv_mailfile()) proves once again that testing is
surprisingly hard in practice.  Most definitely, i also
botched testing for many of the commits that i did in the
past, and i also got many OKs for broken patches that i
posted prematurely.  It happens, even when trying to be
careful.

 7. Rarely used options rarely get tested.  They attracks bugs like
rotting fruits attract flies.  So think twice before adding a
new option to a program.

Yours,
  Ingo



CVSROOT:/cvs
Module name:src
Changes by: schwa...@cvs.openbsd.org2019/07/22 06:39:02

Modified files:
usr.bin/vi/common: recover.c 

Log message:
In secure mode (-S), skip sending mail when executing the :pre[serve]
command or when dying from SIGTERM.  This way, creating the recovery
file works again without re-adding "proc exec" to the pledge(2).
As reported by Jesper Wallin , this got
broken by common/main.c rev. 1.29 (Nov 19, 2015).
The general direction of the fix was suggested by brynet@.
OK brynet@ and no opposition when shown on tech@



Re: [patch] Re-add 'proc' to vi(1) when running in secure mode.

2019-07-22 Thread Ingo Schwarze
Hi Jesper,

Jesper Wallin wrote on Mon, Jul 22, 2019 at 06:09:03PM +0200:
> On Mon, Jul 22, 2019 at 03:23:16PM +0200, Ingo Schwarze wrote:
 
>>  3. Jesper, including a patch according to the best of your
>> understanding is always welcome.  Even if it turns out to be a
>> bad patch, because often even a bad patch helps to understand
>> what the OP thinks the problem might be: the saying is, bad
>> patches trigger good ones.
 
> I'm sorry for late reply, I've been busy/offline the last few days.

No problem at all.

> The lessons I take with me from this:
>   1. Give myself more time to fully understand both the issue itself
>  and what the code actually does, before trying to fix it.
>   2. Test, test with ktrace, test again and then run more tests. ;-)

Sounds reasonable!
Testing is important and often neglected.

But make sure that doesn't cause bugs to not get reported at all
because the process causes too much work or takes too long.  :)

When running out of time, in particular for bugs that seem important
or when it is unclear if and when you might find more time to look
at the issue, sending a preliminary report or a preliminary patch
with a remark like "i'm trying to work on a patch" or "i only did
rudimentary testing so far and ran out of time" might make sense
to avoid needless delays.

Yours,
  Ingo



Re: ascii(7): add a reference to the ANSI X3.4

2019-09-21 Thread Ingo Schwarze
Hello,

Masanori Ogino wrote on Tue, Sep 17, 2019 at 11:49:17AM +0900:

> Unlike utf8(7), ascii(7) does not contain a reference to its standard.
> This patch just adds the reference.

Thanks for pointing out the omission, i committed the following version.

Yours,
  Ingo


CVSROOT:/cvs
Module name:src
Changes by: schwa...@cvs.openbsd.org2019/09/21 00:04:22

Modified files:
share/man/man7 : ascii.7 

Log message:
add STANDARDS section;
similar to a diff sent in by ,
but using the up-to-date reference rather than one from 30 years ago;
OK jmc@

Index: ascii.7
===
RCS file: /cvs/src/share/man/man7/ascii.7,v
retrieving revision 1.9
diff -u -r1.9 ascii.7
--- ascii.7 30 May 2017 15:06:00 -  1.9
+++ ascii.7 21 Sep 2019 01:48:58 -
@@ -102,6 +102,13 @@
 112  p   113  q   114  r   115  s   116  t   117  u   118  v   119  w
 120  x   121  y   122  z   123  {   124  |   125  }   126  ~   127 del
 .Ed
+.Sh STANDARDS
+.Rs
+.%T Information Systems - Coded Character Sets - 7-Bit American National\
+ Standard Code for Information Interchange (7-Bit ASCII)
+.%R INCITS 4-1986[R2017]
+.%Q InterNational Committee for Information Technology Standards
+.Re
 .Sh HISTORY
 An
 .Nm


> Index: share/man/man7/ascii.7
> ===
> RCS file: /cvs/src/share/man/man7/ascii.7,v
> retrieving revision 1.9
> diff -u -r1.9 ascii.7
> --- share/man/man7/ascii.7  30 May 2017 15:06:00 -  1.9
> +++ share/man/man7/ascii.7  17 Sep 2019 02:19:10 -
> @@ -102,6 +102,13 @@
>  112  p   113  q   114  r   115  s   116  t   117  u   118  v   119  w
>  120  x   121  y   122  z   123  {   124  |   125  }   126  ~   127 del
>  .Ed
> +.Sh STANDARDS
> +.Rs
> +.%A American National Standards Institute
> +.%D 1986
> +.%R ANSI X3.4
> +.%T Coded Character Set - 7-bit American Standard Code for
> Information Interchange
> +.Re
>  .Sh HISTORY
>  An
>  .Nm



Re: timeout.9: cite our sources

2019-11-02 Thread Ingo Schwarze
Hi Scott & Jason,

Jason McIntyre wrote on Sat, Nov 02, 2019 at 06:16:49PM +:
> On Sat, Nov 02, 2019 at 12:32:30PM -0500, Scott Cheloha wrote:

>> Cite the paper describing the timing wheel.  PDF here:
>> 
>> http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf
>> 
>> The authors have an older paper describing the same idea, from 1987:
>> 
>> http://www.cs.columbia.edu/~nahum/w6998/papers/sosp87-timing-wheels.pdf
>> 
>> but the 1997 version contains additional information gathered over ten
>> years of praxis.  The typesetting is also way nicer in 1997.
>> 
>> Once question about the .Rs block: why is the issue number rendered
>> before the volume number?  Isn't it usually the other way around?
>> IIRC the issue is a subset of a given volume.

> the ordering of these blocks is kind of set in stone. so mandoc will do
> what groff does. i guess a reordering will involve persuading other
> people that it makes sense.

That is true.  Then again, it does not seem unlikely that we could
persuade people in this case.  Whether volume and issue are used
together at all may depend on the discipline (for example, in high
energay physics, it appears that usually only volume and page numbers
are used for citing journals).  But *if* both are used together, i
suspect you are right that the other order makes more sense.  If
you think it is worth it, tell me off-list and i'll post a trivial
patch to change the order to gr...@gnu.org.  Changing the order in
mandoc is also trivial and would be done at the same time if the
groff folks like the idea.

Whatever that leads to, given both .%V and .%N only render a bare
number, which certainly shouldn't be changed because i suspect using
only one at a time may be the dominant use case, either order is
not very clear when using both together.  So i suggest quoting the
numbers in exactly the same way as given in the original paper:

.%V vol. 5
.%N no. 6

>> Oh, another question: is it standard to mention the city of publication
>> for an academic journal in a citation?  ToN is published from Piscataway,
>> NJ, fwiw.

> i don;t know, but i think the amount of info used in these blocks is
> determined by the person adding it - how much info is useful? for rfcs,
> for example, we settled on a standard blurb that both gave credit and
> allowed readers to find the docs. whether you need all the refs you've
> given i guess it decided by you (i think often a title suffices).

I agree with Jason.

For journals, in particular for well-known ones, the city of
publication is often omitted because it is not really needed for
getting a copy.  In this case, i would probably omit it.  For books,
the city is often provided in citations, almost as a matter of
convention.  For more obscure publications, it may even be important.

>> ok?

> no objection from me.

Same here, but i recommend the change explained above.

Also, you are deleting information from the source code comment that
would probably be misplaced in the manual: "Scheme 7".  You might
wish to leave that piece of information in the code unless it happens
to be incorrect or irrelevant.

Yours,
  Ingo


>> Index: share/man/man9/timeout.9
>> ===
>> RCS file: /cvs/src/share/man/man9/timeout.9,v
>> retrieving revision 1.46
>> diff -u -p -r1.46 timeout.9
>> --- share/man/man9/timeout.9 14 Apr 2019 08:51:31 -  1.46
>> +++ share/man/man9/timeout.9 2 Nov 2019 17:14:48 -
>> @@ -282,3 +282,15 @@ These functions are implemented in the f
>>  .Xr splclock 9 ,
>>  .Xr tsleep 9 ,
>>  .Xr tvtohz 9
>> +.Rs
>> +.%A George Varghese
>> +.%A Anthony Lauck
>> +.%B Hashed and hierarchical timing wheels: efficient data structures for \
>> +implementing a timer facility
>> +.%I IEEE/ACM
>> +.%J Transactions on Networking
>> +.%V 5
>> +.%N 6
>> +.%P pp. 824\(en834
>> +.%D December 1997
>> +.Re
>> Index: sys/kern/kern_timeout.c
>> ===
>> RCS file: /cvs/src/sys/kern/kern_timeout.c,v
>> retrieving revision 1.60
>> diff -u -p -r1.60 kern_timeout.c
>> --- sys/kern/kern_timeout.c  2 Nov 2019 16:56:17 -   1.60
>> +++ sys/kern/kern_timeout.c  2 Nov 2019 17:14:48 -
>> @@ -45,9 +45,7 @@
>>  /*
>>   * Timeouts are kept in a hierarchical timing wheel. The to_time is the 
>> value
>>   * of the global variable "ticks" when the timeout should be called. There 
>> are
>> - * four levels with 256 buckets each. See 'Scheme 7' in
>> - * "Hashed and Hierarchical Timing Wheels: Efficient Data Structures for
>> - * Implementing a Timer Facility" by George Varghese and Tony Lauck.
>> + * four levels with 256 buckets each.
>>   */
>>  #define BUCKETS 1024
>>  #define WHEELSIZE 256



<    2   3   4   5   6   7   8   9   10   >