Re: malloc.c: correlation between random choices

2018-01-18 Thread kshe
On Thu, 18 Jan 2018 08:54:21 +, Otto Moerbeek wrote:
> Looking back the rotor thing is ill-convceived indeed. I'm now
> testing the diff below. I still re-use r, because I really think a
> little bit of correlation does not hurt here.

Actually, I think it does hurt, because it introduces a lot of
regularity in the allocation patterns.  This could be exploited by an
attacker in some way, as it makes it much easier to reason about the
precise layout of those pages than an unbiased distribution would.  For
example, for chunks of size MALLOC_MAXCHUNK / 2, there should be 24
different ways to fill a given page, but currently only four of them are
possible: 0123, 1320, 2103 and 3012.  This is even worse in the new diff
that you suggest, where the only possible configurations are the most
trivial ones: 0123, 1230, 2301 and 3012.

Likewise, for all the other sizes (except MALLOC_MAXCHUNK of course),
the number of possible patterns that can arise from the current code (or
the one that you suggest) is considerably lower than the expected n!
permutations that would be generated by an unbiased approach.  Moreover,
these patterns tend to be very regular and easy to predict, as seen in
the above examples.

Are random bytes really so expansive that it justifies such intentional
biases to be introduced?  I actually thought they were quite cheap,
especially seeing as, on average, more than 25% of them are merely
discarded (and never used) for the sole purpose of introducing noise in
the arc4random_buf(3) calls...

So what I would suggest is unconditionally initialising `r' with two
getrbyte() calls.  For all architectures, this gives enough random bits
to choose the list to use as well as the offset to start at, both in a
perfectly uniform way, without any correlation nor bias of any kind, and
without the need for accumulating bits in an extra rotor variable or
doing some other conditional expansion of the resulting offset when
there are not enough bits in it.  But, indeed, this method includes this
one additional getrbyte() call compared to the current one.  So the
question is: is that really a problem?

(By the way, your diff was also off by one in the check against
UCHAR_MAX, but this is a different issue.)

--- malloc.c.orig   Sun Jan 14 11:18:10 2018
+++ malloc.cThu Jan 18 09:45:39 2018
@@ -172,7 +172,6 @@ struct chunk_info {
u_short free;   /* how many free chunks */
u_short total;  /* how many chunks */
u_short offset; /* requested size table offset */
-   u_short rotor;  /* randomization rotor */
u_short bits[1];/* which chunks are free */
 };
 
@@ -941,7 +940,7 @@ malloc_bytes(struct dir_info *d, size_t size, void *f)
 
j = find_chunksize(size);
 
-   r = getrbyte(d);
+   r = ((u_int)getrbyte(d) << 8) | getrbyte(d);
listnum = r % MALLOC_CHUNK_LISTS;
/* If it's empty, make a page more of that size chunks */
if ((bp = LIST_FIRST(>chunk_dir[j][listnum])) == NULL) {
@@ -953,9 +952,7 @@ malloc_bytes(struct dir_info *d, size_t size, void *f)
if (bp->canary != (u_short)d->canary1)
wrterror(d, "chunk info corrupted");
 
-   if (bp->free > 1)
-   bp->rotor += r;
-   i = bp->rotor++ & (bp->total - 1);
+   i = (r / MALLOC_CHUNK_LISTS) & (bp->total - 1);
 
/* start somewhere in a short */
lp = >bits[i / MALLOC_BITS];

Regards,

kshe



jot(1): remove arbitrary length limits

2018-01-03 Thread kshe
} else if (*(p+1) == '\0') {
/* cannot end in single '%' */
-   if (strlcat(format, "%", sizeof(format)) >= sizeof(format))
-   errx(1, "-w word too long");
+   if ((format = realloc(format, len + sizeof("%"))) == NULL)
+   err(1, NULL);
+   strcpy(format + len, "%");
} else {
/*
 * Allow conversion format specifiers of the form
@@ -456,9 +452,10 @@ fmt_broken:
p++;
else if (*p == '%' && *(p+1) == '\0') {
/* cannot end in single '%' */
-   if (strlcat(format, "%", sizeof(format)) >=
-   sizeof(format))
-   errx(1, "-w word too long");
+   if ((format = realloc(format,
+   len + sizeof("%"))) == NULL)
+   err(1, NULL);
+   strcpy(format + len, "%");
break;
}
}

Regards,

kshe



Re: malloc cleanup and small optimization (step 2)

2018-01-02 Thread kshe
On Sat, 30 Dec 2017 11:50:53 +, Otto Moerbeek wrote:
> Slightly different diff: instead of initing all chunk_info structs in
> page, do it on demand at first use. Measurements show that a lot of
> programs only use a few chunk_info structs, so it is a bit wasteful to
> always initialize all of them. Also de-inline the init code for
> readability.

Perhaps a point of detail, but how about setting up the bitmap with

i = p->total;
memset(p->bits, 0xff, sizeof(p->bits[0]) * (i / MALLOC_BITS));
if (i % MALLOC_BITS != 0)
p->bits[i / MALLOC_BITS] = (1U << (i % MALLOC_BITS)) - 1;

or, since p->total should never be zero,

i = p->total - 1;
memset(p->bits, 0xff, sizeof(p->bits[0]) * (i / MALLOC_BITS));
p->bits[i / MALLOC_BITS] = (2U << (i % MALLOC_BITS)) - 1;

instead of

for (i = 0; p->total - i >= MALLOC_BITS; i += MALLOC_BITS)
p->bits[i / MALLOC_BITS] = (u_short)~0U;
if (i < p->total)
p->bits[i / MALLOC_BITS] = 0;
for (; i < p->total; i++)
p->bits[i / MALLOC_BITS] |= (u_short)1U << (i % MALLOC_BITS);

at the end of the new init_chunk_info() function?

Besides, a few lines above those loops, I think p->shift could directly
be set to MALLOC_MINSHIFT when bits == 0, without having to recalculate
it from MALLOC_MINSIZE.

Regards,

kshe



malloc.conf.5: minor correction

2017-12-29 Thread kshe
Hi,

The `F' option no longer disables delayed freeing.

Also, I think documenting implementation details like the exact value
of the junk bytes is not very useful.

Index: malloc.conf.5
===
RCS file: /cvs/src/share/man/man5/malloc.conf.5,v
retrieving revision 1.12
diff -u -p -r1.12 malloc.conf.5
--- malloc.conf.5   23 Sep 2017 15:13:34 -  1.12
+++ malloc.conf.5   28 Dec 2017 20:56:27 -
@@ -73,13 +73,10 @@ Increase the junk level by one if it is 
 .Dq Less junking .
 Decrease the junk level by one if it is larger than 0.
 Junking writes some junk bytes into the area allocated.
-Currently junk is bytes of 0xdb when allocating;
-freed chunks are filled with 0xdf.
 By default the junk level is 1: small chunks are always junked
 and the first part of pages is junked after free.
-After a delay (if not switched off by the F option),
-the filling pattern is validated and the process is aborted if the pattern
-was modified.
+After a delay, the filling pattern is validated
+and the process is aborted if the pattern was modified.
 If the junk level is zero, no junking is performed.
 For junk level 2, junking is done without size restrictions.
 .It Cm R

Regards,

kshe



Re: malloc cleanup and small optimization (step 2)

2017-12-29 Thread kshe
al offset of that chunk */
@@ -1045,7 +1030,7 @@ validate_canary(struct dir_info *d, u_char *ptr, size_
if (*p != SOME_JUNK) {
wrterror(d, "chunk canary corrupted %p %#tx@%#zx%s",
ptr, p - ptr, sz, *p == SOME_FREEJUNK ?
-   " (double free?)" : "");
+   " (double free?)" : "");
}
p++;
}
@@ -1169,8 +1154,7 @@ omalloc(struct dir_info *pool, size_t sz, int zero_fil
else
memset(p, SOME_JUNK,
psz - mopts.malloc_guard);
-   }
-   else if (mopts.chunk_canaries)
+   } else if (mopts.chunk_canaries)
fill_canary(p, sz - mopts.malloc_guard,
psz - mopts.malloc_guard);
}
@@ -1221,7 +1205,7 @@ _malloc_init(int from_rthreads)
max = from_rthreads ? _MALLOC_MUTEXES : 1;
if (((uintptr_t)_readonly & MALLOC_PAGEMASK) == 0)
mprotect(_readonly, sizeof(malloc_readonly),
-PROT_READ | PROT_WRITE);
+   PROT_READ | PROT_WRITE);
for (i = 0; i < max; i++) {
if (mopts.malloc_pool[i])
continue;
@@ -2026,8 +2010,7 @@ omemalign(struct dir_info *pool, size_t alignment, siz
SOME_JUNK, psz - sz);
else
memset(p, SOME_JUNK, psz - mopts.malloc_guard);
-   }
-   else if (mopts.chunk_canaries)
+   } else if (mopts.chunk_canaries)
fill_canary(p, sz - mopts.malloc_guard,
psz - mopts.malloc_guard);
 

Regards,

kshe



sh.1: backslash within double quotes

2017-12-29 Thread kshe
Hi,

Within double quotes, backslashes also escape newlines.  This is
correctly documented in ksh.1, but not in sh.1.

Index: sh.1
===
RCS file: /cvs/src/bin/ksh/sh.1,v
retrieving revision 1.145
diff -u -p -r1.145 sh.1
--- sh.115 Dec 2017 20:51:28 -  1.145
+++ sh.128 Dec 2017 20:44:07 -
@@ -1165,7 +1165,7 @@ if the user wants to indicate to the she
 The following characters need quoting if their literal meaning is desired:
 .Bd -literal -offset indent
 | & ; < > ( ) $ \` \e " \(aq   
-* ?  [ # ~ = %
+* ? [ # ~ = %
 .Ed
 .Pp
 A backslash
@@ -1190,7 +1190,7 @@ A backslash
 .Pq \e
 within double quotes retains its special meaning,
 but only when followed by a backquote, dollar sign,
-double quote, or another backslash.
+double quote, newline, or another backslash.
 An at sign
 .Pq @
 within double quotes has a special meaning

Regards,

kshe



jot(1): one-byte overflow in error path

2017-12-29 Thread kshe
Hi,

If the format string ends in an invalid specifier like `%l', p will
already point to the trailing NUL upon entering the switch, wherein the
instruction

*++p = '\0';

will write another NUL after it, but there is no guarantee that the
buffer extends beyond that first NUL; thus, in the rare case where it
does not, this assignment will write one byte past its end.

While here, this diff also simplifies the said switch by removing some
unneeded cases.

Index: jot.c
===
RCS file: /cvs/src/usr.bin/jot/jot.c,v
retrieving revision 1.39
diff -u -p -r1.39 jot.c
--- jot.c   15 Dec 2017 14:20:52 -  1.39
+++ jot.c   29 Dec 2017 03:51:50 -
@@ -406,8 +406,7 @@ getformat(void)
if (*p == 'l') {
longdata = true;
if (*++p == 'l') {
-   if (p[1] != '\0')
-   p++;
+   p++;
goto fmt_broken;
}
}
@@ -439,9 +438,6 @@ getformat(void)
chardata = true;
break;
}
-   /* FALLTHROUGH */
-   case 'h': case 'n': case 'p': case 'q': case 's': case 'L':
-   case '$': case '*':
goto fmt_broken;
case 'f': case 'e': case 'g': case 'E': case 'G':
if (!longdata)
@@ -449,7 +445,8 @@ getformat(void)
/* FALLTHROUGH */
default:
 fmt_broken:
-   *++p = '\0';
+   if (*p != '\0')
+   p[1] = '\0';
errx(1, "illegal or unsupported format '%s'", p2);
}
while (*++p != '\0')

Regards,

kshe



Re: uniq: add -i option

2017-12-23 Thread kshe
On Sat, 23 Dec 2017 23:13:22 +, Theo Buehler wrote:
> > Obviously, the relevant condition should have been
> >
> > if ((iflag ? strcasecmp : strcmp)(t1, t2) != 0)
> >
> > instead of awkwardly messing with logical AND and OR.
>
> Indeed, this is much better. I like your version, but perhaps others
> might like this one more:
>
> if ((iflag && strcasecmp(t1, t2)) || strcmp(t1, t2))

Well, this is even worse: this version not only has the same kind of
performance drawback as the current one, but it is also logically
incorrect, and will cause the `-i' flag to have no effect at all.

If you really want to do this without a ternary operator, the equivalent
form is

if (iflag && strcasecmp(t1, t2) || !iflag && strcmp(t1, t2))

which looks very dumb indeed, but at least it is logically sound.

Regards,

kshe



Re: less(1): `!' command

2017-12-23 Thread kshe
On Fri, 22 Dec 2017 22:21:12 +, Stuart Henderson wrote:
> On 2017/12/22 19:47, Nicholas Marriott wrote:
> > I don't think we should bring ! back.
> >
> > I wanted to remove v and | (and some other stuff) shortly afterwards, but
> > several people objected.
> >
> > I did suggest having a lightweight less in base for most people and adding
> > the full upstream less to ports for the stuff we don't want to maintain
> > (like we do for eg libevent) but other people didn't like that idea.
>
> less(1) can already be made more lightweight by setting LESSSECURE=1.
> (I quite like this even without the reduced pledge, my biggest annoyance
> with less is when I accidentally press 'v').
>
> Any opinions on switching the default?

I thought about that possibility too, and I mostly agree with the idea
as I also run less(1) in secure mode very often, but it is nevertheless
quite irrelevant to the original concern, which is that, when one
chooses not to run less(1) in secure mode, whether that mode is the
default one or not, it is inconsistent, for multiple reasons, to have
removed the `!' command, but not `v' nor `|'.

Until some form of agreement can be reached on that issue, I have
reverted the removal of `!' in my personal tree, so I still pay the
exact same price as everybody else ("proc exec"), but at least I now get
something useful out of that.

Regards,

kshe



dc(1): minor simplification in bexp()

2017-12-23 Thread kshe
Hi,

When p->scale is not zero, bexp() calculates the same value twice, first
in split_number() and then in normalize().  This can be avoided by
simply reusing the result produced by the former.  (While here, also
fold a long line.)

Index: bcode.c
===
RCS file: /cvs/src/usr.bin/dc/bcode.c,v
retrieving revision 1.61
diff -u -p -r1.61 bcode.c
--- bcode.c 12 Dec 2017 19:07:10 -  1.61
+++ bcode.c 17 Dec 2017 13:18:15 -
@@ -1178,11 +1178,11 @@ bexp(void)
bn_checkp(f);
split_number(p, i, f);
if (!BN_is_zero(f))
-   warnx("Runtime warning: non-zero fractional part in 
exponent");
-   BN_free(i);
+   warnx("Runtime warning: non-zero fractional part "
+   "in exponent");
+   BN_free(p->number);
+   p->number = i;
BN_free(f);
-
-   normalize(p, 0);
}
 
neg = BN_is_negative(p->number);

Regards,

kshe



Re: uniq: add -i option

2017-12-23 Thread kshe
Hi,

This change causes uniq(1) to compare equal lines twice when run without
`-i', once in strcmp(3) and once again in strcasecmp(3).  In the worst
case, which is also one of the most common, the main loop spends about
half of its time copying buffers and looking for newlines in fgets(3),
and the other half actually comparing those buffers; hence, in practice,
because of this commit, it has now become 25% slower than it was before.

$ jot -b a -s a 4080 >tmp
$ cat $(jot -b tmp 4096) >tmp2
$ cat $(jot -b tmp2 16) >lines

$ time ./uniq /dev/null
0m01.60s real 0m00.80s user 0m00.75s system
$ time uniq /dev/null
0m01.23s real 0m00.47s user 0m00.73s system

Obviously, the relevant condition should have been

if ((iflag ? strcasecmp : strcmp)(t1, t2) != 0)

instead of awkwardly messing with logical AND and OR.

That said, it seems that this whole program was not really written with
performance in mind in the first place, as in less than an hour I was
able to write a new uniq(1) which, apart from handling arbitrarily long
lines and NUL bytes correctly as well as consistently parsing its
arguments (contrary to OpenBSD's version), has proved an order of
magnitude faster than the latter in such worst cases, so I guess a 25%
slowdown may not appear that important after all.  But, still, I think
the code makes little sense as it currently stands, if only from a
logical point of view and regardless of those performance
considerations.

Regards,

kshe



Re: less(1): `!' command

2017-12-17 Thread kshe
On Sat, 16 Dec 2017 21:52:44 +, Theo de Raadt wrote:
> > On Sat, 16 Dec 2017 19:39:27 +, Theo de Raadt wrote:
> > > > On Sat, 16 Dec 2017 18:13:16 +, Jiri B wrote:
> > > > > On Sat, Dec 16, 2017 at 04:55:44PM +, kshe wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Would a patch to bring back the `!' command to less(1) be accepted? 
> > > > > >  The
> > > > > > commit message for its removal explains that ^Z should be used 
> > > > > > instead,
> > > > > > but that obviously does not work if less(1) is run from something 
> > > > > > else
> > > > > > than an interactive shell, for example when reading manual pages 
> > > > > > from a
> > > > > > vi(1) instance spawned directly by `xterm -e vi' in a window 
> > > > > > manager or
> > > > > > by `neww vi' in a tmux(1) session.
> > > > >
> > > > > Why should less be able to spawn another programs? This would 
> > > > > undermine
> > > > > all pledge work.
> > > >
> > > > Because of at least `v' and `|', less(1) already is able to invoke
> > > > arbitrary programs, and accordingly needs the "proc exec" promise, so
> > > > bringing `!' back would not change anything from a security perspective
> > > > (otherwise, I would obviously not have made such a proposition).
> > > >
> > > > In fact, technically, what I want to do is still currently possible:
> > > > from any less(1) instance, one may use `v' to invoke vi(1), and then use
> > > > vi(1)'s own `!' command as desired.  So the functionality of `!' is
> > > > still there; it was only made more difficult to reach for no apparent
> > > > reason.
> > >
> > > No apparent reason?
> > >
> > > Good you have an opinion.  I have a different opinion: We should look
> > > for rarely used functionality and gut it.
> >
> > I completely agree, and I also completely agree with the rest of what
> > you said.  However, in this particular case, the functionality of `!' is
> > still fully (albeit indirectly) accessible, as shown above, and this is
> > why its deletion, when not immediately followed by that of `|' and `v',
> > made little sense for me.
>
> Oh, so you don't agree.  Or do you.  I can't tell.  You haven't made up
> your mind enough to have a final position?

In the case of less(1), the underlying functionality of `!' (invoking
arbitrary programs) has not been removed at all, as `!' itself was only
one way amongst others of doing that.  Therefore, I would have prefered
that such an endeavour be conducted in steps at least as large as a
pledge(2) category.  You may say this is absolutist, but, in the end,
users might actually be more inclined to accept such removals if they
come with, and thus are justified by, a real and immediate security
benefit, like stricter pledge(2) promises, rather than some vague
theoretical explanation about the global state of their software
environment.

> [...]
>
> > May I go ahead and prepare a patch to remove "proc exec" entirely?
>
> Sure you could try, and see who freaks out.  Exactly what the plan was
> all along.

The minimal diff below does that.  If it is accepted, further cleanups
would need to follow (in particular, removing a few unused variables and
functions), and of course the manual would also need some adjustments.

Index: cmd.h
===
RCS file: /cvs/src/usr.bin/less/cmd.h,v
retrieving revision 1.10
diff -u -p -r1.10 cmd.h
--- cmd.h   6 Nov 2015 15:58:01 -   1.10
+++ cmd.h   17 Dec 2017 12:23:00 -
@@ -42,12 +42,12 @@
 #defineA_FF_LINE   29
 #defineA_BF_LINE   30
 #defineA_VERSION   31
-#defineA_VISUAL32
+/* 32 unused */
 #defineA_F_WINDOW  33
 #defineA_B_WINDOW  34
 #defineA_F_BRACKET 35
 #defineA_B_BRACKET 36
-#defineA_PIPE  37
+/* 37 unused */
 #defineA_INDEX_FILE38
 #defineA_UNDO_SEARCH   39
 #defineA_FF_SCREEN 40
Index: command.c
===
RCS file: /cvs/src/usr.bin/less/command.c,v
retrieving revision 1.31
diff -u -p -r1.31 command.c
--- command.c   12 Jan 2017 20:32:01 -  1.31
+++ command.c   17 Dec 2017 12:23:00 -
@@ -241,12 +241,6 @@ exec_mca(void)
/* I

Re: less(1): `!' command

2017-12-16 Thread kshe
On Sat, 16 Dec 2017 19:39:27 +, Theo de Raadt wrote:
> > On Sat, 16 Dec 2017 18:13:16 +, Jiri B wrote:
> > > On Sat, Dec 16, 2017 at 04:55:44PM +, kshe wrote:
> > > > Hi,
> > > >
> > > > Would a patch to bring back the `!' command to less(1) be accepted?  The
> > > > commit message for its removal explains that ^Z should be used instead,
> > > > but that obviously does not work if less(1) is run from something else
> > > > than an interactive shell, for example when reading manual pages from a
> > > > vi(1) instance spawned directly by `xterm -e vi' in a window manager or
> > > > by `neww vi' in a tmux(1) session.
> > >
> > > Why should less be able to spawn another programs? This would undermine
> > > all pledge work.
> >
> > Because of at least `v' and `|', less(1) already is able to invoke
> > arbitrary programs, and accordingly needs the "proc exec" promise, so
> > bringing `!' back would not change anything from a security perspective
> > (otherwise, I would obviously not have made such a proposition).
> >
> > In fact, technically, what I want to do is still currently possible:
> > from any less(1) instance, one may use `v' to invoke vi(1), and then use
> > vi(1)'s own `!' command as desired.  So the functionality of `!' is
> > still there; it was only made more difficult to reach for no apparent
> > reason.
>
> No apparent reason?
>
> Good you have an opinion.  I have a different opinion: We should look
> for rarely used functionality and gut it.

I completely agree, and I also completely agree with the rest of what
you said.  However, in this particular case, the functionality of `!' is
still fully (albeit indirectly) accessible, as shown above, and this is
why its deletion, when not immediately followed by that of `|' and `v',
made little sense for me.

Either the commands that require "proc exec" should all be removed along
with that promise, or `!' should be brought back without any pledge(2)
modifications.  But currently it really feels like a big waste (for both
parties) to request such high privileges, and then to do almost nothing
useful with them.

If the plan really was to get rid of all such commands eventually, what
exactly is preventing that from happening now?  May I go ahead and
prepare a patch to remove "proc exec" entirely?

Regards,

kshe



Re: less(1): `!' command

2017-12-16 Thread kshe
On Sat, 16 Dec 2017 18:13:16 +, Jiri B wrote:
> On Sat, Dec 16, 2017 at 04:55:44PM +0000, kshe wrote:
> > Hi,
> >
> > Would a patch to bring back the `!' command to less(1) be accepted?  The
> > commit message for its removal explains that ^Z should be used instead,
> > but that obviously does not work if less(1) is run from something else
> > than an interactive shell, for example when reading manual pages from a
> > vi(1) instance spawned directly by `xterm -e vi' in a window manager or
> > by `neww vi' in a tmux(1) session.
>
> Why should less be able to spawn another programs? This would undermine
> all pledge work.

Because of at least `v' and `|', less(1) already is able to invoke
arbitrary programs, and accordingly needs the "proc exec" promise, so
bringing `!' back would not change anything from a security perspective
(otherwise, I would obviously not have made such a proposition).

In fact, technically, what I want to do is still currently possible:
from any less(1) instance, one may use `v' to invoke vi(1), and then use
vi(1)'s own `!' command as desired.  So the functionality of `!' is
still there; it was only made more difficult to reach for no apparent
reason.

Regards,

kshe



dc(1): properly report errors

2017-12-16 Thread kshe
Hi,

On error, the BN functions do not set errno, so errx(3) should be used
instead of err(3).  Moreover, one may call ERR_reason_error_string(3) to
obtain a real error message from an error code.

$ dc -e '2 2 64^^'
dc: big number failure 3fff072: Undefined error: 0

$ ./dc -e '2 2 64^^'
dc: BN failure: bignum too long

Index: mem.c
===
RCS file: /cvs/src/usr.bin/dc/mem.c,v
retrieving revision 1.9
diff -u -p -r1.9 mem.c
--- mem.c   12 Dec 2017 19:08:57 -  1.9
+++ mem.c   15 Dec 2017 09:35:48 -
@@ -91,13 +91,19 @@ bstrdup(const char *p)
 void
 bn_check(int x)
 {
-   if (x == 0)
-   err(1, "big number failure %lx", ERR_get_error());
+   if (x == 0) {
+   ERR_load_BN_strings();
+   errx(1, "BN failure: %s",
+   ERR_reason_error_string(ERR_get_error()));
+   }
 }
 
 void
 bn_checkp(const void *p)
 {
-   if (p == NULL)
-   err(1, "allocation failure %lx", ERR_get_error());
+   if (p == NULL) {
+   ERR_load_BN_strings();
+   errx(1, "BN failure: %s",
+   ERR_reason_error_string(ERR_get_error()));
+   }
 }

Regards,

kshe



less(1): `!' command

2017-12-16 Thread kshe
Hi,

Would a patch to bring back the `!' command to less(1) be accepted?  The
commit message for its removal explains that ^Z should be used instead,
but that obviously does not work if less(1) is run from something else
than an interactive shell, for example when reading manual pages from a
vi(1) instance spawned directly by `xterm -e vi' in a window manager or
by `neww vi' in a tmux(1) session.

If not, then at least documentation for this command should be removed
properly (I cannot provide a diff as this file contains raw backspace
characters):

$ cd /usr/src/usr.bin/less/
$ printf '99d\nwq\n' | ed - less.hlp

Regards,

kshe



Re: net/rtsock.c: size to free(9)

2017-12-16 Thread kshe
On Wed, 13 Dec 2017 08:56:53 +, Martin Pieuchot wrote:
> Thanks.  I'd suggest you for the next time to not to mix withespace or
> style changes with a functional change.
>
> That said it'd be great if you could look at other free(9) calls missing
> the size argument.

The diff below deals with the last three calls in rtsock.c.  The case of
rt_llinfo when RTF_MPLS is set in rt_flags seems safe as it is similar
to what has already been done in route.c.

Index: rtsock.c
===
RCS file: /cvs/src/sys/net/rtsock.c,v
retrieving revision 1.258
diff -u -p -r1.258 rtsock.c
--- rtsock.c13 Dec 2017 08:59:02 -  1.258
+++ rtsock.c14 Dec 2017 16:13:04 -
@@ -980,7 +980,8 @@ change:
/* if gateway changed remove MPLS information */
if (rt->rt_llinfo != NULL &&
rt->rt_flags & RTF_MPLS) {
-   free(rt->rt_llinfo, M_TEMP, 0);
+   free(rt->rt_llinfo, M_TEMP,
+   sizeof(struct rt_mpls));
rt->rt_llinfo = NULL;
rt->rt_flags &= ~RTF_MPLS;
}
@@ -1363,22 +1364,20 @@ again:
/* align message length to the next natural boundary */
len = ALIGN(len);
if (cp == 0 && w != NULL && !second_time) {
-   struct walkarg *rw = w;
-
-   rw->w_needed += len;
-   if (rw->w_needed <= 0 && rw->w_where) {
-   if (rw->w_tmemsize < len) {
-   free(rw->w_tmem, M_RTABLE, 0);
-   rw->w_tmem = malloc(len, M_RTABLE, M_NOWAIT);
-   if (rw->w_tmem)
-   rw->w_tmemsize = len;
+   w->w_needed += len;
+   if (w->w_needed <= 0 && w->w_where) {
+   if (w->w_tmemsize < len) {
+   free(w->w_tmem, M_RTABLE, w->w_tmemsize);
+   w->w_tmem = malloc(len, M_RTABLE, M_NOWAIT);
+   if (w->w_tmem)
+   w->w_tmemsize = len;
}
-   if (rw->w_tmem) {
-   cp = rw->w_tmem;
+   if (w->w_tmem) {
+   cp = w->w_tmem;
second_time = 1;
goto again;
} else
-   rw->w_where = 0;
+   w->w_where = 0;
}
}
if (cp && w)/* clear the message header */
@@ -1809,7 +1808,7 @@ sysctl_rtable(int *name, u_int namelen, 
NET_UNLOCK();
break;
}
-   free(w.w_tmem, M_RTABLE, 0);
+   free(w.w_tmem, M_RTABLE, w.w_tmemsize);
w.w_needed += w.w_given;
if (where) {
*given = w.w_where - (caddr_t)where;

Regards,

kshe



vi(1): cscope leftover

2017-12-16 Thread kshe
Hi,

Support for cscope connections was removed from vi(1) a few years ago.

Index: ex/ex_cmd.c
===
RCS file: /cvs/src/usr.bin/vi/ex/ex_cmd.c,v
retrieving revision 1.10
diff -u -p -r1.10 ex_cmd.c
--- ex/ex_cmd.c 19 Nov 2015 07:53:31 -  1.10
+++ ex/ex_cmd.c 14 Dec 2017 16:07:48 -
@@ -145,8 +145,8 @@ EXCMDLIST const cmds[] = {
 /* C_DISPLAY */
{"display", ex_display, 0,
"w1r",
-   "display b[uffers] | c[onnections] | s[creens] | t[ags]",
-   "display buffers, connections, screens or tags"},
+   "display b[uffers] | s[creens] | t[ags]",
+   "display buffers, screens or tags"},
 /* C_EDIT */
{"edit",ex_edit,E_NEWSCREEN,
"f1o",
@@ -427,6 +427,6 @@ EXCMDLIST const cmds[] = {
{"~",   ex_subtilde,E_ADDR2,
"s",
"[line [,line]] ~ [cgr] [count] [#lp]",
-   "replace previous RE with previous replacement string,"},
+   "replace previous RE with previous replacement string"},
{NULL},
 };

Regards,

kshe



Re: remove one bug, add three

2017-12-12 Thread kshe
Hi again,

I would like to mention that the committed patch for the trivial bugs
that I reported is not even complete: there are still two problematic
instructions literally one line above the one that did get fixed
correctly (and I actually find this mildly amusing, especially
considering the last paragraph of my first message).

On Tue, 12 Dec 2017 16:09:36 +, Martijn van Duren wrote:
> Also the reasonable way is debatable because it's behaviour
> actually changes on the BRE side if you use n as a separator and gsed
> has an identical quirk on the replacement side:

Ah, so now you notice that there are even more incompatibilities between
this version and GNU sed where POSIX leaves the behaviour undefined;
would you also want to transform these cases into hard errors, then?
Please go right ahead, at least I am sure millert@ would like it.

> Patch to do so below. Not asking for OKs (yet), since compile_re is used
> in more places and whipped up in about a minute.

I understand that you may be busy with many other things at once, but
rushing out absurd diffs like this one or committing half-baked patches
on top of less than half-baked patches is not a very efficient approach.
I am happy to wait until you find enough time for a proper evaluation of
the issue at stake; this is not a race.

Regards,

kshe



Re: remove one bug, add three

2017-12-12 Thread kshe
On Tue, 12 Dec 2017 12:44:03 +, Todd C. Miller wrote:
> On Tue, 12 Dec 2017 11:57:58 +0000, kshe wrote:
>
> > Perhaps the worst part of all this, though, is how the change of
> > behaviour, which made sed fail hard where it previously handled input in
> > a perfectly defined and reasonable way, was apparently approved because
> > "implementations do vary in how they handle [it], so throwing an error
> > is probably best".  Following the same kind of reasoning, I think
> > OpenBSD should also modify the `echo' command to fail if given an
> > argument like `-E', as its behaviour in that case differs from system to
> > system, hence the current implementation is likewise "just creating a
> > trap for the user", and surely this is unacceptable and therefore ought
> > to be fixed, right?
>
> It's not really the same situation.  The question is what to do in
> cases where POSIX leaves the behavior undefined and where there is
> no consensus among implementations.  Is it best for sed to be as
> consistent as possible, knowing that other commonly used implementations
> will produce different results, or is it better to produce an error
> for the unwary user?  This is not just theoretical, we run into
> issues all the time with scripts that "work fine" on Linux but fail
> in odd ways on other systems that don't use the GNU utilities.
>
> Most users will have no way to determine the source of the problem.
> At least if the undefined behavior results in an error they have
> something to go on.

Trying to prevent the unwary user from being unwary is a noble but
impossible task to accomplish.  There are so many ways to introduce
non-portability in shell scripts that replacing historical behaviour by
hard failures in an attempt to improve this situation is likely to be
counterproductive, especially when such attempt goes against the
internal consistency of the affected commands.

Regards,

kshe



remove one bug, add three

2017-12-12 Thread kshe
Hi,

While attempting to fix one bug, the recent commit to sed regarding the
`y' command has introduced three new problems.

The first one is that it happily uses a plain `char' as the index for
the array `check', which obviously leads to havoc as soon as one tries
to translate non-ASCII characters, whereby sed, overwriting its own
memory, either fails with bogus error messages

$ ./sed $(printf 'y/\220/a/')
sed: 1: "y/\x90/a/": Repeated character in source string

$ ./sed $(printf 'y/\360/a/')
sed: 1: "y/\xf0/a/": unexpected EOF (pending }'s)

or merely dumps core.

$ ./sed $(printf 'y/\222/a/')
Segmentation fault (core dumped)

The second bug is that, even if one were to cast the said index to an
unsigned value, the `check' array would still be off by one, as it is
declared to have a length of 255 instead of 256, once again causing
trivial one-byte buffer overflows.

The third bug is more of a behavioural one: while the `y' command was
previously slightly diverging from its description by POSIX, at least
both `s' and `y' were compiled in the same way, such that the two
similar commands `yn\nnXn' and `sn\nnXn' were handled consistently, as
one would expect.  With this commit, however, these two commands now
have exactly the opposite behaviour, thus making the poor sed even more
incoherent than it ever was.

As an aside, in case the irony is not overwhelming enough already, I
would also like to point out how the first submitted diff addressing
this issue decided to change the comment

/* We assume characters are 8 bits */

into

// We assume characters are 8 bits

after which it was objected that this modification was not justified and
hence should not be part of the final patch.  However, at no point did
someone realize that the code is in fact not assuming anything regarding
the number of bits in a character, and hence that this comment, wether
written in one style or the other, is actually quite ridiculous anyway.

Perhaps the worst part of all this, though, is how the change of
behaviour, which made sed fail hard where it previously handled input in
a perfectly defined and reasonable way, was apparently approved because
"implementations do vary in how they handle [it], so throwing an error
is probably best".  Following the same kind of reasoning, I think
OpenBSD should also modify the `echo' command to fail if given an
argument like `-E', as its behaviour in that case differs from system to
system, hence the current implementation is likewise "just creating a
trap for the user", and surely this is unacceptable and therefore ought
to be fixed, right?

Having thus shown that this commit was not only misguided technically
but also conceptually, rather than applying yet another misguided patch
on top of it in an attempt to repair this whole mess, I would like to
request that it instead be completely reverted, after which I would be
able to provide a proper and definitive fix to deal with this issue
(and, at the same time, many other unrelated bugs).

Regards,

kshe



Re: net/rtsock.c: size to free(9)

2017-12-11 Thread kshe
On Sun, 10 Dec 2017 11:25:50 +, Martin Pieuchot wrote:
> On 08/12/17(Fri) 12:58, kshe wrote:
> > I noticed one instance where the size given to free(9) can easily be
> > determined.
>
> What about the other free(9)s in the same function?

Somehow I did not immediately realize that rtm_report() would simply
store the allocated length in rtm->rtm_msglen.  Now that I do, here is
an updated diff dealing with the remaining calls (along with some more
unrelated whitespace fixes).

Index: rtsock.c
===
RCS file: /cvs/src/sys/net/rtsock.c,v
retrieving revision 1.256
diff -u -p -r1.256 rtsock.c
--- rtsock.c10 Dec 2017 11:25:18 -  1.256
+++ rtsock.c11 Dec 2017 07:27:49 -
@@ -697,17 +697,18 @@ route_output(struct mbuf *m, struct sock
 * Validate RTM_PROPOSAL and pass it along or error out.
 */
if (rtm->rtm_type == RTM_PROPOSAL) {
-  if (rtm_validate_proposal() == -1) {
+   if (rtm_validate_proposal() == -1) {
error = EINVAL;
goto fail;
-  }
+   }
} else {
error = rtm_output(rtm, , , prio, tableid);
if (!error) {
type = rtm->rtm_type;
seq = rtm->rtm_seq;
-   free(rtm, M_RTABLE, 0);
+   free(rtm, M_RTABLE, len);
rtm = rtm_report(rt, type, seq, tableid);
+   len = rtm->rtm_msglen;
}
}
 
@@ -725,18 +726,18 @@ route_output(struct mbuf *m, struct sock
if (route_cb.any_count <= 1) {
/* no other listener and no loopback of messages */
 fail:
-   free(rtm, M_RTABLE, 0);
+   free(rtm, M_RTABLE, len);
m_freem(m);
return (error);
}
}
if (rtm) {
-   if (m_copyback(m, 0, rtm->rtm_msglen, rtm, M_NOWAIT)) {
+   if (m_copyback(m, 0, len, rtm, M_NOWAIT)) {
m_freem(m);
m = NULL;
-   } else if (m->m_pkthdr.len > rtm->rtm_msglen)
-   m_adj(m, rtm->rtm_msglen - m->m_pkthdr.len);
-   free(rtm, M_RTABLE, 0);
+   } else if (m->m_pkthdr.len > len)
+   m_adj(m, len - m->m_pkthdr.len);
+   free(rtm, M_RTABLE, len);
}
if (m)
route_input(m, so, info.rti_info[RTAX_DST] ?
@@ -1161,7 +1162,7 @@ route_cleargateway(struct rtentry *rt, v
 
if (ISSET(rt->rt_flags, RTF_GATEWAY) && rt->rt_gwroute == nhrt &&
!ISSET(rt->rt_locks, RTV_MTU))
-rt->rt_mtu = 0;
+   rt->rt_mtu = 0;
 
return (0);
 }
@@ -1485,7 +1486,7 @@ void
 rtm_addr(struct rtentry *rt, int cmd, struct ifaddr *ifa)
 {
struct ifnet*ifp = ifa->ifa_ifp;
-   struct mbuf *m = NULL;
+   struct mbuf *m;
struct rt_addrinfo   info;
struct ifa_msghdr   *ifam;
 

Regards,

kshe



net/rtsock.c: size to free(9)

2017-12-08 Thread kshe
Hi,

I noticed one instance where the size given to free(9) can easily be
determined.  While here, this diff also removes an outdated comment
(since r1.230) as well as a dead initialisation in rtm_addr().

Index: src/sys/net/rtsock.c
===
RCS file: /cvs/src/sys/net/rtsock.c,v
retrieving revision 1.255
diff -u -p -r1.255 rtsock.c
--- src/sys/net/rtsock.c3 Nov 2017 16:23:20 -   1.255
+++ src/sys/net/rtsock.c7 Dec 2017 06:16:04 -
@@ -224,7 +224,7 @@ route_attach(struct socket *so, int prot
 
if (curproc == NULL)
error = EACCES;
-   else 
+   else
error = soreserve(so, RAWSNDQ, RAWRCVQ);
if (error) {
free(rop, M_PCB, sizeof(struct routecb));
@@ -694,12 +694,6 @@ route_output(struct mbuf *m, struct sock
}
 
/*
-* Do not use goto flush before this point since the message itself
-* may be not consistent and could cause unexpected behaviour in other
-* userland clients. Use goto fail instead.
-*/
-
-   /*
 * Validate RTM_PROPOSAL and pass it along or error out.
 */
if (rtm->rtm_type == RTM_PROPOSAL) {
@@ -712,7 +706,7 @@ route_output(struct mbuf *m, struct sock
if (!error) {
type = rtm->rtm_type;
seq = rtm->rtm_seq;
-   free(rtm, M_RTABLE, 0);
+   free(rtm, M_RTABLE, len);
rtm = rtm_report(rt, type, seq, tableid);
}
}
@@ -897,7 +891,7 @@ rtm_output(struct rt_msghdr *rtm, struct
 */
plen = rtable_satoplen(info->rti_info[RTAX_DST]->sa_family,
info->rti_info[RTAX_NETMASK]);
-   if (rt_plen(rt) != plen ) {
+   if (rt_plen(rt) != plen) {
error = ESRCH;
break;
}
@@ -1491,7 +1485,7 @@ void
 rtm_addr(struct rtentry *rt, int cmd, struct ifaddr *ifa)
 {
struct ifnet*ifp = ifa->ifa_ifp;
-   struct mbuf *m = NULL;
+   struct mbuf *m;
struct rt_addrinfo   info;
struct ifa_msghdr   *ifam;
 

Regards,

kshe



fstat(1): fix misaligned headers

2017-12-08 Thread kshe
Hi,

Last time a new field was added, headers were not adjusted to account
for the extra column in the output.

Index: fstat.c
===
RCS file: /cvs/src/usr.bin/fstat/fstat.c,v
retrieving revision 1.90
diff -u -p -r1.90 fstat.c
--- fstat.c 21 Jan 2017 12:21:57 -  1.90
+++ fstat.c 7 Dec 2017 03:12:55 -
@@ -318,10 +318,10 @@ fstat_header(void)
 {
if (nflg)
printf("%s",
-"USER CMD  PID   FD  DEV  INUM   MODE   R/WSZ|DV");
+"USER CMD  PID   FD  DEV  INUMMODE   R/WSZ|DV");
else
printf("%s",
-"USER CMD  PID   FD MOUNTINUM MODE R/WSZ|DV");
+"USER CMD  PID   FD MOUNTINUM  MODE R/WSZ|DV");
if (oflg)
printf("%s", ":OFFSET  ");
if (checkfile && fsflg == 0)

Regards,

kshe



dc(1): always use bn_checkp()

2017-12-08 Thread kshe
Hi,

This is the only instance where a manual check is made instead of
calling bn_checkp().

Index: mem.c
===
RCS file: /cvs/src/usr.bin/dc/mem.c,v
retrieving revision 1.7
diff -u -p -r1.7 mem.c
--- mem.c   16 Feb 2015 20:53:34 -  1.7
+++ mem.c   7 Dec 2017 04:13:06 -
@@ -32,8 +32,7 @@ new_number(void)
n = bmalloc(sizeof(*n));
n->scale = 0;
n->number = BN_new();
-   if (n->number == NULL)
-   err(1, NULL);
+   bn_checkp(n->number);
return n;
 }
 

Regards,

kshe



dc(1): small cleanups

2017-12-08 Thread kshe
Hi,

This patch includes the following minor changes:

1.  Remove the `__inline' markers from functions that are only called
from the jump table (I would actually like to remove them all as they
are both ugly and utterly useless, but that would yield a much larger
diff).

2.  Make bexp() slightly more readable and do not try to normalize() a
number whose scale is already zero.

3.  Fix indentation of a brace in set_scale().

4.  Always put the `else' keyword on the same line as the preceding
closing brace.

5.  Delete excess empty lines.

Index: bcode.c
===
RCS file: /cvs/src/usr.bin/dc/bcode.c,v
retrieving revision 1.59
diff -u -p -r1.59 bcode.c
--- bcode.c 5 Dec 2017 14:05:22 -   1.59
+++ bcode.c 7 Dec 2017 04:31:38 -
@@ -66,13 +66,13 @@ static __inline voidpush(struct value *
 static __inline struct value *tos(void);
 static __inline struct number  *pop_number(void);
 static __inline char   *pop_string(void);
-static __inline void   clear_stack(void);
-static __inline void   print_tos(void);
+static voidclear_stack(void);
+static voidprint_tos(void);
 static voidprint_err(void);
 static voidpop_print(void);
 static voidpop_printn(void);
-static __inline void   print_stack(void);
-static __inline void   dup(void);
+static voidprint_stack(void);
+static voiddup(void);
 static voidswap(void);
 static voiddrop(void);
 
@@ -477,27 +477,26 @@ pop_string(void)
return stack_popstring();
 }
 
-static __inline void
+static void
 clear_stack(void)
 {
stack_clear();
 }
 
-static __inline void
+static void
 print_stack(void)
 {
stack_print(stdout, , "", bmachine.obase);
 }
 
-static __inline void
+static void
 print_tos(void)
 {
struct value *value = tos();
if (value != NULL) {
print_value(stdout, value, "", bmachine.obase);
(void)putchar('\n');
-   }
-   else
+   } else
warnx("stack empty");
 }
 
@@ -508,8 +507,7 @@ print_err(void)
if (value != NULL) {
print_value(stderr, value, "", bmachine.obase);
(void)putc('\n', stderr);
-   }
-   else
+   } else
warnx("stack empty");
 }
 
@@ -548,7 +546,7 @@ pop_printn(void)
}
 }
 
-static __inline void
+static void
 dup(void)
 {
stack_dup();
@@ -594,7 +592,7 @@ set_scale(void)
bmachine.scale = (u_int)scale;
else
warnx("scale too large");
-   }
+   }
free_number(n);
}
 }
@@ -673,7 +671,6 @@ push_scale(void)
u_int   scale = 0;
struct number   *n;
 
-
value = pop();
if (value != NULL) {
switch (value->type) {
@@ -920,8 +917,7 @@ load_array(void)
n = new_number();
bn_check(BN_set_word(n->number, 0));
push_number(n);
-   }
-   else
+   } else
push(stack_dup_value(v, ));
}
free_number(inumber);
@@ -1198,13 +1185,12 @@ bexp(void)
warnx("Runtime warning: non-zero fractional part in 
exponent");
BN_free(i);
BN_free(f);
-   }
 
-   normalize(p, 0);
+   normalize(p, 0);
+   }
 
-   neg = false;
-   if (BN_is_negative(p->number)) {
-   neg = true;
+   neg = BN_is_negative(p->number);
+   if (neg) {
negate(p);
rscale = bmachine.scale;
} else {
@@ -1507,7 +1493,6 @@ compare(enum bcode_compare type)
}
}
 }
-
 
 static void
 nop(void)

Regards,

kshe



dc(1): recycle numbers

2017-12-08 Thread kshe
Hi,

Basic arithmetic on relatively small numbers can be made much cheaper by
leveraging BN_add(3), BN_sub(3) and BN_mul(3)'s ability to store the
calculation's result in one of their operands, thus reducing allocation
overhead.  For example, with this simplification, incrementing a counter
becomes almost twice as fast.

$ echo 0 >script
$ jot -b 1+ 16777216 >>script

$ time dc script
0m18.03s real 0m18.05s user 0m00.00s system
$ time ./dc script
0m09.92s real 0m09.91s user 0m00.01s system

Combined with the previous one regarding BN_CTX, this diff also makes
multiplication at least 60% faster than it originally was.

$ echo 1 >script
$ jot -b '1*' 16777216 >>script

$ time dc script
0m27.88s real 0m27.88s user 0m00.00s system
$ time ./dc script
0m10.51s real 0m10.51s user 0m00.00s system

Index: bcode.c
===
RCS file: /cvs/src/usr.bin/dc/bcode.c,v
retrieving revision 1.59
diff -u -p -r1.59 bcode.c
--- bcode.c 5 Dec 2017 14:05:22 -   1.59
+++ bcode.c 7 Dec 2017 04:31:38 -
@@ -978,7 +978,6 @@ static void
 badd(void)
 {
struct number   *a, *b;
-   struct number   *r;
 
a = pop_number();
if (a == NULL)
@@ -989,23 +988,19 @@ badd(void)
return;
}
 
-   r = new_number();
-   r->scale = max(a->scale, b->scale);
-   if (r->scale > a->scale)
-   normalize(a, r->scale);
-   else if (r->scale > b->scale)
-   normalize(b, r->scale);
-   bn_check(BN_add(r->number, a->number, b->number));
-   push_number(r);
+   if (a->scale < b->scale)
+   normalize(a, b->scale);
+   else if (a->scale > b->scale)
+   normalize(b, a->scale);
+   bn_check(BN_add(b->number, a->number, b->number));
free_number(a);
-   free_number(b);
+   push_number(b);
 }
 
 static void
 bsub(void)
 {
struct number   *a, *b;
-   struct number   *r;
 
a = pop_number();
if (a == NULL)
@@ -1016,17 +1011,13 @@ bsub(void)
return;
}
 
-   r = new_number();
-
-   r->scale = max(a->scale, b->scale);
-   if (r->scale > a->scale)
-   normalize(a, r->scale);
-   else if (r->scale > b->scale)
-   normalize(b, r->scale);
-   bn_check(BN_sub(r->number, b->number, a->number));
-   push_number(r);
+   if (a->scale < b->scale)
+   normalize(a, b->scale);
+   else if (a->scale > b->scale)
+   normalize(b, a->scale);
+   bn_check(BN_sub(b->number, b->number, a->number));
free_number(a);
-   free_number(b);
+   push_number(b);
 }
 
 void
@@ -1048,7 +1039,6 @@ static void
 bmul(void)
 {
struct number   *a, *b;
-   struct number   *r;
 
a = pop_number();
if (a == NULL)
@@ -1059,12 +1049,9 @@ bmul(void)
return;
}
 
-   r = new_number();
-   bmul_number(r, a, b, bmachine.scale);
-
-   push_number(r);
+   bmul_number(b, a, b, bmachine.scale);
free_number(a);
-   free_number(b);
+   push_number(b);
 }
 
 static void

Regards,

kshe



Re: dc(1); fix 0Z

2017-12-04 Thread kshe
On Sun, 03 Dec 2017 12:25:15 +, Philippe Meunier wrote:
> kshe wrote:
> >Also, the manual defines the length of a number as its number of digits,
> >so perhaps it should be precised that zero is considered to have no
> >digits, which might not be obvious to everyone.
>
> Am I the only who thinks this is not just not obvious, but actually wrong?

If the number `002' is said to have only one digit because the zeros in
the left are not significant, then likewise the number `000', which
hence only has insignificant digits, can in perfect coherence be
considered to have no digits at all.

Another way to see why this makes sense is by defining the number of
digits in base b of an integer n by

intlog_b(n) + 1 := floor(log_b(n)) + 1

and noticing that the very natural recursive expression of such integer
logarithm

intlog_b(n+1) = | intlog_b(n) + 1   if n+1 is a power of b
| intlog_b(n)   otherwise

can be made to hold for n = 0 if and only if one sets intlog_b(0) = -1
by convention, which indeed leads to the conclusion that zero must have
zero digits as per the definition above.  Any other convention for the
value of intlog_b(0) would not preserve this fundamental invariant of
the integer logarithm, thus being nothing but arbitrary, and as such of
little practical value.

Regards,

kshe



dc(1): global context

2017-12-04 Thread kshe
gt;number, NULL, b->number, a->number, ctx));
-   BN_CTX_free(ctx);
}
push_number(r);
free_number(a);
@@ -1122,7 +1106,6 @@ bmod(void)
struct number   *a, *b;
struct number   *r;
u_int   scale;
-   BN_CTX  *ctx;
 
a = pop_number();
if (a == NULL)
@@ -1143,10 +1126,7 @@ bmod(void)
normalize(a, scale);
normalize(b, scale + bmachine.scale);
 
-   ctx = BN_CTX_new();
-   bn_checkp(ctx);
bn_check(BN_mod(r->number, b->number, a->number, ctx));
-   BN_CTX_free(ctx);
}
push_number(r);
free_number(a);
@@ -1159,7 +1139,6 @@ bdivmod(void)
struct number   *a, *b;
struct number   *rdiv, *rmod;
u_int   scale;
-   BN_CTX  *ctx;
 
a = pop_number();
if (a == NULL)
@@ -1182,11 +1161,8 @@ bdivmod(void)
normalize(a, scale);
normalize(b, scale + bmachine.scale);
 
-   ctx = BN_CTX_new();
-   bn_checkp(ctx);
bn_check(BN_div(rdiv->number, rmod->number,
b->number, a->number, ctx));
-   BN_CTX_free(ctx);
}
push_number(rdiv);
push_number(rmod);
@@ -1273,14 +1249,11 @@ bexp(void)
}
 
if (neg) {
-   BN_CTX  *ctx;
BIGNUM  *one;
 
one = BN_new();
bn_checkp(one);
bn_check(BN_one(one));
-   ctx = BN_CTX_new();
-   bn_checkp(ctx);
scale_number(one, r->scale + rscale);
 
if (BN_is_zero(r->number))
@@ -1289,7 +1262,6 @@ bexp(void)
bn_check(BN_div(r->number, NULL, one,
r->number, ctx));
BN_free(one);
-   BN_CTX_free(ctx);
r->scale = rscale;
} else
normalize(r, rscale);
@@ -1306,7 +1278,6 @@ bsqrt(void)
struct number   *r;
BIGNUM  *x, *y, *t;
u_int   scale, onecount;
-   BN_CTX  *ctx;
 
onecount = 0;
n = pop_number();
@@ -1325,8 +1296,6 @@ bsqrt(void)
bn_check(BN_rshift(x, x, BN_num_bits(x)/2));
y = BN_new();
bn_checkp(y);
-   ctx = BN_CTX_new();
-   bn_checkp(ctx);
do {
bn_check(BN_div(y, NULL, n->number, x, ctx));
bn_check(BN_add(y, x, y));
@@ -1341,7 +1310,6 @@ bsqrt(void)
r->scale = scale;
r->number = y;
BN_free(x);
-   BN_CTX_free(ctx);
push_number(r);
}
 

Regards,

kshe



Re: dc(1); fix 0Z

2017-12-02 Thread kshe
On Sat, 02 Dec 2017 07:50:35 +, Otto Moerbeek wrote:
> Spotted while working on kshe's diff.
>
> Makes Z0p work the same as both gnu dc and the orignal dc.
>
> OK?

I guess you probably mean `0Zp', and in that case a mere

return n->scale;

should suffice here.

Also, the manual defines the length of a number as its number of digits,
so perhaps it should be precised that zero is considered to have no
digits, which might not be obvious to everyone.

Other than that, I think this makes more sense indeed.

Regards,

kshe



Re: dc(1): floor(log_10(x)) should not cost more than O(log(log(x)))

2017-12-01 Thread kshe
On Fri, 01 Dec 2017 08:58:55 +, Otto Moerbeek wrote:
> On Thu, Nov 30, 2017 at 01:10:33PM +0000, kshe wrote:
> > On Thu, 30 Nov 2017 07:22:42 +, Otto Moerbeek wrote:
> > > On Sun, Nov 26, 2017 at 07:40:03PM +, kshe wrote:
> > > > Hi,
> > > >
> > > > The `Z' command can be a handy shortcut for computing logarithms; as
> > > > such, for example, it is the basis of the implementation of bc(1)'s `l'
> > > > function.  However, the algorithm currently used in count_digits() is
> > > > too naive to be really useful, becoming rapidly much slower than what
> > > > would be expected from a native command.
> > > >
> > > > To see how this computation could easily be made exponentially faster,
> > > > one may start by noticing that, if next() is the function defined for
> > > > any real r as
> > > >
> > > > next(r) := floor(r) + 1,
> > > >
> > > > then clearly, for any strictly positive integer x,
> > > >
> > > > floor(log_2(x)) <= log_2(x) < next(log_2(x))
> > > >
> > > > and therefore
> > > >
> > > > log_10(2) * floor(log_2(x)) <= log_10(x) < k,
> > > >
> > > > where
> > > >
> > > > k := log_10(2) * next(log_2(x)).
> > > >
> > > > Since log_10(2) < 1, it follows that
> > > >
> > > > floor(k) <= next(k - log_10(2)) <= next(log_10(x)) <= next(k),
> > > >
> > > > which proves that next(log_10(x)) is either floor(k) or next(k).
> > > >
> > > > If next(log_10(x)) = floor(k), then
> > > >
> > > > 10^floor(k) = 10^next(log_10(x)) > 10^log_10(x) = x.
> > > >
> > > > If next(log_10(x)) = next(k), then
> > > >
> > > > 10^floor(k) = 10^floor(log_10(x)) <= 10^log_10(x) = x.
> > > >
> > > > Therefore, if x >= 10^floor(k), then next(log_10(x)) cannot be floor(k),
> > > > hence it must be next(k); likewise, if x < 10^floor(k), then
> > > > next(log_10(x)) cannot be next(k), hence it must be floor(k).  Using the
> > > > conventional integer value of a boolean expression, this result can be
> > > > summarised as
> > > >
> > > > next(log_10(x)) = floor(k) + (x >= 10^floor(k)).
> > > >
> > > > As an additional refinement, one may further notice that if
> > > >
> > > > floor(k) = floor(log_10(2) * floor(log_2(x)))
> > > >
> > > > then
> > > >
> > > > 10^floor(k) = 10^floor(log_10(2) * floor(log_2(x)))
> > > > <= 10^(log_10(2) * floor(log_2(x)))
> > > > <= 2^floor(log_2(x))
> > > > <= x
> > > >
> > > > so that it can readily be concluded that
> > > >
> > > > next(log_10(x)) = next(k)
> > > >
> > > > without having to compute 10^floor(k).
> > > >
> > > > The BN library permits computation of k in O(1) and 10^floor(k) in
> > > > O(log(k)) which is O(log(log(x))).  Therefore, one can compute
> > > > next(log_10(x)) in O(1) most of the time (at least on average, and with
> > > > a certain definition of such average, the full analysis of which is, I
> > > > presume, outside the scope of this message), with a worst case of
> > > > O(log(log(x))).  In contrast, the current code is exponentially worse
> > > > than what its worst case should be, computing this value in O(log(x)).
> > > >
> > > > $ jot -b 9 -s '' 65536 >script
> > > > $ echo Z >>script
> > > >
> > > > $ time dc script
> > > > 0m03.57s real 0m03.56s user 0m00.01s system
> > > > $ time ./dc script
> > > > 0m00.12s real 0m00.12s user 0m00.00s system
> > > >
> > > > The diff below implements this optimisation.  It also fixes a small
> > > > logic error in split_number(), which is used by count_digits().
> > >
> > > General comment: it would be a good thing to add (part of) the proof
> > > or a reference to some published work to the code in a comment.
> > > Especially the derivation of c and the computation of d seem a bit
> > > like dropping out of thin air.
> >
> > All the above proof says is that a logarithm in one base i

wprintf.3: fix synopsis

2017-11-30 Thread kshe
Hi,

This adds a missing parameter name and makes whitespace more consistent
in the synopsis of wprintf(3).  As a bonus, also link to this manual
from plain printf(3).

Index: printf.3
===
RCS file: /cvs/src/lib/libc/stdio/printf.3,v
retrieving revision 1.76
diff -u -p -r1.76 printf.3
--- printf.312 Jun 2017 18:37:12 -  1.76
+++ printf.330 Nov 2017 04:05:21 -
@@ -758,6 +758,7 @@ The return value would be too large to b
 .El
 .Sh SEE ALSO
 .Xr printf 1 ,
+.Xr wprintf 3 ,
 .Xr scanf 3
 .Sh STANDARDS
 The
Index: wprintf.3
===
RCS file: /cvs/src/lib/libc/stdio/wprintf.3,v
retrieving revision 1.4
diff -u -p -r1.4 wprintf.3
--- wprintf.3   13 May 2014 20:51:00 -  1.4
+++ wprintf.3   30 Nov 2017 04:05:21 -
@@ -51,9 +51,9 @@
 .Fn wprintf "const wchar_t * restrict format" ...
 .In stdarg.h
 .Ft int
-.Fn vfwprintf "FILE * restrict stream" "const wchar_t * restrict" "va_list ap"
+.Fn vfwprintf "FILE * restrict stream" "const wchar_t * restrict format" 
"va_list ap"
 .Ft int
-.Fn vswprintf "wchar_t * restrict ws" "size_t n" "const wchar_t *restrict 
format" "va_list ap"
+.Fn vswprintf "wchar_t * restrict ws" "size_t n" "const wchar_t * restrict 
format" "va_list ap"
 .Ft int
 .Fn vwprintf "const wchar_t * restrict format" "va_list ap"
 .Sh DESCRIPTION

Regards,

kshe



Re: dc(1) mishandles fractional input in bases other than 10

2017-11-30 Thread kshe
On Tue, 28 Nov 2017 09:52:26 +, Otto Moerbeek wrote:
> On Sun, Nov 26, 2017 at 07:51:13PM +0000, kshe wrote: 
> > Hi,
> >
> > The following behaviour seems unacceptable to me.
> >
> > $ dc -e '16dio .C .C0 f'
> > .C0
> > .B
> >
> > $ dc -e '8dio .3 .30 .300 f'
> > .3000
> > .275
> > .23
> >
> > This bug affects all bases other than 10 and increasing the scale with
> > the `k' command does not prevent it.  The only reliable way to input a
> > rational number in these bases is therefore to input an integer first,
> > and then divide it by the appropriate factor; alternatively, as
> > illustrated above, adding meaningless zeros to the digit expansion can
> > help, although I have not verified that this works in all cases nor how
> > many zeros would need to be added to guarantee the expected result
> > every time.
> >
> > I could try to provide a patch for this, but I dislike how the current
> > code gives special privileges to base 10 at the expense of all the
> > others, even though it is supposed to accept input in any base; so my
> > patch would probably be deemed too invasive as I would want to change
> > too many things at once, including in areas irrelevant to this
> > particular issue.
> >
> > If a less intrusive workaround is not possible, then perhaps at least a
> > warning about fractional input being essentially broken in bases other
> > than 10 could be added to the manual.
> >
> > Regards,
> >
> > kshe
>
> I implemented dc(1) to stay as close as possible to the original dc(1)
> and GNU dc(1). Any change in the way fractional numbers are handled
> (even though you might consider it broken) will break existing
> programs.

I agree, and this is why I refrained from submitting a patch for this
one.  That said, since `.C' being effectively interpreted as `.B' is
probably not what any new user would expect, could something like the
following be useful?  (This diff also includes an unrelated spelling
fix.)

Index: dc.1
===
RCS file: /cvs/src/usr.bin/dc/dc.1,v
retrieving revision 1.31
diff -u -p -r1.31 dc.1
--- dc.128 Nov 2017 06:51:19 -  1.31
+++ dc.130 Nov 2017 04:00:42 -
@@ -245,7 +245,7 @@ Mark used by the
 operator.
 The
 .Ic M
-operator is a non-portable extensions, used by the
+operator is a non-portable extension, used by the
 .Xr bc 1
 command.
 .It Ic N
@@ -536,3 +536,8 @@ The current version of the
 .Nm
 utility was written by
 .An Otto Moerbeek .
+.Sh CAVEATS
+While fractional input in base 10 is always exact,
+other bases may suffer from unintuitive rounding.
+To avoid surprising results, plain integer division can be used
+instead of the corresponding floating point notation.

Regards,

kshe



Re: freezero(3) for stdio's internal buffers

2017-11-30 Thread kshe
';
-   free(convbuf);
-   cp = convbuf = __mbsconv(dtoaresult, -1);
+   cp = convbuf = __mbsconv(dtoaresult, -1, convbuf,
+   _size);
if (cp == NULL)
goto error;
ndig = dtoaend - dtoaresult;
@@ -727,8 +730,8 @@ fp_begin:
if (expt == )
expt = INT_MAX;
}
-   free(convbuf);
-   cp = convbuf = __mbsconv(dtoaresult, -1);
+   cp = convbuf = __mbsconv(dtoaresult, -1, convbuf,
+   _size);
if (cp == NULL)
goto error;
ndig = dtoaend - dtoaresult;
@@ -849,8 +852,8 @@ fp_common:
 
mbsarg = "(null)";
}
-   free(convbuf);
-   convbuf = __mbsconv(mbsarg, prec);
+   convbuf = __mbsconv(mbsarg, prec, convbuf,
+   _size);
if (convbuf == NULL) {
fp->_flags |= __SERR;
goto error;
@@ -1065,7 +1068,7 @@ overflow:
ret = -1;
 
 finish:
-   free(convbuf);
+   freezero(convbuf, convbuf_size * sizeof(*convbuf));
 #ifdef FLOATING_POINT
if (dtoaresult)
__freedtoa(dtoaresult);
Index: vswprintf.c
===
RCS file: /cvs/src/lib/libc/stdio/vswprintf.c,v
retrieving revision 1.6
diff -u -p -r1.6 vswprintf.c
--- vswprintf.c 31 Aug 2015 02:53:57 -  1.6
+++ vswprintf.c 30 Nov 2017 04:05:21 -
@@ -64,13 +64,13 @@ vswprintf(wchar_t * __restrict s, size_t
ret = __vfwprintf(, fmt, ap);
if (ret < 0) {
sverrno = errno;
-   free(f._bf._base);
+   freezero(f._bf._base, f._bf._size);
errno = sverrno;
return (-1);
}
if (ret == 0) {
s[0] = L'\0';
-   free(f._bf._base);
+   freezero(f._bf._base, f._bf._size);
return (0);
}
*f._p = '\0';
@@ -81,7 +81,7 @@ vswprintf(wchar_t * __restrict s, size_t
 */
bzero(, sizeof(mbs));
nwc = mbsrtowcs(s, (const char **), n, );
-   free(f._bf._base);
+   freezero(f._bf._base, f._bf._size);
if (nwc == (size_t)-1) {
errno = EILSEQ;
return (-1);
Index: vswscanf.c
===
RCS file: /cvs/src/lib/libc/stdio/vswscanf.c,v
retrieving revision 1.3
diff -u -p -r1.3 vswscanf.c
--- vswscanf.c  31 Aug 2015 02:53:57 -  1.3
+++ vswscanf.c  30 Nov 2017 04:05:21 -
@@ -70,7 +70,7 @@ vswscanf(const wchar_t * __restrict str,
bzero(, sizeof(mbs));
strp = str;
if ((mlen = wcsrtombs(mbstr, , len, )) == (size_t)-1) {
-   free(mbstr);
+   freezero(mbstr, len);
return (EOF);
}
if (mlen == len)
@@ -82,7 +82,7 @@ vswscanf(const wchar_t * __restrict str,
    f._read = eofread;
f._lb._base = NULL;
r = __vfwscanf(, fmt, ap);
-   free(mbstr);
+   freezero(mbstr, len);
 
return (r);
 }

Regards,

kshe



Re: dc(1): floor(log_10(x)) should not cost more than O(log(log(x)))

2017-11-30 Thread kshe
On Thu, 30 Nov 2017 07:22:42 +, Otto Moerbeek wrote:
> On Sun, Nov 26, 2017 at 07:40:03PM +0000, kshe wrote:
> > Hi,
> >
> > The `Z' command can be a handy shortcut for computing logarithms; as
> > such, for example, it is the basis of the implementation of bc(1)'s `l'
> > function.  However, the algorithm currently used in count_digits() is
> > too naive to be really useful, becoming rapidly much slower than what
> > would be expected from a native command.
> >
> > To see how this computation could easily be made exponentially faster,
> > one may start by noticing that, if next() is the function defined for
> > any real r as
> >
> > next(r) := floor(r) + 1,
> >
> > then clearly, for any strictly positive integer x,
> >
> > floor(log_2(x)) <= log_2(x) < next(log_2(x))
> >
> > and therefore
> >
> > log_10(2) * floor(log_2(x)) <= log_10(x) < k,
> >
> > where
> >
> > k := log_10(2) * next(log_2(x)).
> >
> > Since log_10(2) < 1, it follows that
> >
> > floor(k) <= next(k - log_10(2)) <= next(log_10(x)) <= next(k),
> >
> > which proves that next(log_10(x)) is either floor(k) or next(k).
> >
> > If next(log_10(x)) = floor(k), then
> >
> > 10^floor(k) = 10^next(log_10(x)) > 10^log_10(x) = x.
> >
> > If next(log_10(x)) = next(k), then
> >
> > 10^floor(k) = 10^floor(log_10(x)) <= 10^log_10(x) = x.
> >
> > Therefore, if x >= 10^floor(k), then next(log_10(x)) cannot be floor(k),
> > hence it must be next(k); likewise, if x < 10^floor(k), then
> > next(log_10(x)) cannot be next(k), hence it must be floor(k).  Using the
> > conventional integer value of a boolean expression, this result can be
> > summarised as
> >
> > next(log_10(x)) = floor(k) + (x >= 10^floor(k)).
> >
> > As an additional refinement, one may further notice that if
> >
> > floor(k) = floor(log_10(2) * floor(log_2(x)))
> >
> > then
> >
> > 10^floor(k) = 10^floor(log_10(2) * floor(log_2(x)))
> > <= 10^(log_10(2) * floor(log_2(x)))
> > <= 2^floor(log_2(x))
> > <= x
> >
> > so that it can readily be concluded that
> >
> > next(log_10(x)) = next(k)
> >
> > without having to compute 10^floor(k).
> >
> > The BN library permits computation of k in O(1) and 10^floor(k) in
> > O(log(k)) which is O(log(log(x))).  Therefore, one can compute
> > next(log_10(x)) in O(1) most of the time (at least on average, and with
> > a certain definition of such average, the full analysis of which is, I
> > presume, outside the scope of this message), with a worst case of
> > O(log(log(x))).  In contrast, the current code is exponentially worse
> > than what its worst case should be, computing this value in O(log(x)).
> >
> > $ jot -b 9 -s '' 65536 >script
> > $ echo Z >>script
> >
> > $ time dc script
> > 0m03.57s real 0m03.56s user 0m00.01s system
> > $ time ./dc script
> > 0m00.12s real 0m00.12s user 0m00.00s system
> >
> > The diff below implements this optimisation.  It also fixes a small
> > logic error in split_number(), which is used by count_digits().
>
> General comment: it would be a good thing to add (part of) the proof
> or a reference to some published work to the code in a comment.
> Especially the derivation of c and the computation of d seem a bit
> like dropping out of thin air.

All the above proof says is that a logarithm in one base is easily
convertible to the corresponding one in another base.  Since this has
been known for a few centuries already, I see no reason to provide more
details here than for the algorithms (and associated constants) featured
in /usr/share/misc/bc.library, where no explanations of any kind are to
be found either.

> From a style point of view I do not like the assignment in
> conditionals very much.

Please feel free to apply your prefered style to this patch.

> There's also the problem of bits * c overflowing, though that's likely
> theoretical.

To provoke such overflow, one would first need a platform where ints are
larger than 32 bits, and the involved numbers would have to exceed
2^2147483648.  This is indeed unlikely to happen.

Regards,

kshe



Re: sed.1: miscellaneous corrections

2017-11-28 Thread kshe
On Mon, 27 Nov 2017 10:41:05 +, Jason McIntyre wrote:
> On Sun, Nov 26, 2017 at 07:47:01PM +0000, kshe wrote:
> > Hi,
> >
> > I noticed a certain number of inaccuracies within the manual page for
> > sed.  The diff below corrects to most obvious ones, although further
> > improvements are certainly possible.
> >
> > Additionally, the script given in the EXAMPLES section being already
> > quite unnecessarily contrived (as it does in twelve commands what could
> > be done in merely two), I took the opportunity to make it slightly
> > simpler and easier to read.
>
> morning.
>
> this diff is too much for review (at least for me anyway). could you
> parcel it up into some logical parts and resend?

These changes fall into five different categories.  The manually split
diffs below include a succinct description of each, in the hope of
making review easier.

1.  Remove false claims about restrictions that do not exist in POSIX
nor in this implementation.  Also clarify that label names may be
terminated by semicolons, although not specified by the standard.

Index: sed.1
===
RCS file: /cvs/src/usr.bin/sed/sed.1,v
retrieving revision 1.50
diff -u -p -r1.50 sed.1
--- sed.1   19 Jul 2017 21:28:19 -  1.50
+++ sed.1   17 Nov 2017 02:50:36 -
@@ -57,10 +57,9 @@ utility reads the specified files, or th
 are specified, modifying the input as specified by a list of commands.
 The input is then written to the standard output.
 .Pp
-A single command may be specified as the first argument to
-.Nm sed .
-Multiple commands may be specified
-separated by newlines or semicolons,
+Commands are separated by newlines or semicolons and may be specified
+either as the first argument to
+.Nm
 or by using the
 .Fl e
 or
@@ -95,7 +94,6 @@ to the list of commands.
 Append the editing commands found in the file
 .Ar command_file
 to the list of commands.
-The editing commands should each be listed on a separate line.
 .It Fl i Ns Op Ar extension
 Edit files in place, saving backups with the specified
 .Ar extension .
@@ -264,23 +262,22 @@ Files are created
 before any input processing begins.
 .Pp
 The
-.Ic b ,
-.Ic r ,
-.Ic s ,
-.Ic t ,
-.Ic w ,
-.Ic y ,
+.Ar label
+argument to the
+.Ic \&: ,
+.Ic b
 and
-.Ic \&:
-functions all accept additional arguments.
-The synopses below indicate which arguments have to be separated from
+.Ic t
+commands may be terminated by either a newline or a semicolon,
+although the former should be prefered if portability is desired.
+.Pp
+For functions that accept additional arguments,
+the synopses below indicate which have to be separated from
 the function letters by whitespace characters.
 .Pp
 Functions can be combined to form a
-.Em function list ,
-a list of
-.Nm
-functions each followed by a newline, as follows:
+.Em function list
+as follows:
 .Bd -literal -offset indent
 { function
   function
@@ -569,12 +565,6 @@ specification.
 The flags
 .Op Fl aEiru
 are extensions to that specification.
-.Pp
-The use of newlines to separate multiple commands on the command line
-is non-portable;
-the use of newlines to separate multiple commands within a command file
-.Pq Fl f Ar command_file
-is portable.
 .Sh HISTORY
 A
 .Nm
@@ -583,8 +573,6 @@ command appeared in
 .Sh CAVEATS
 The use of semicolons to separate multiple commands
 is not permitted for the following commands:
-.Ic a , b , c ,
-.Ic i , r , t ,
-.Ic w , \&: ,
+.Ic a , c , i , r , w
 and
 .Ic # .

2.  Specifying a file name for commands reading from and writing to
files is obviously not optional.

$ sed w
sed: 1: "w": filename expected

Index: sed.1
===
RCS file: /cvs/src/usr.bin/sed/sed.1,v
retrieving revision 1.50
diff -u -p -r1.50 sed.1
--- sed.1   19 Jul 2017 21:28:19 -  1.50
+++ sed.1   17 Nov 2017 02:50:36 -
@@ -255,7 +253,7 @@ as well as the
 flag to the
 .Ic s
 function,
-take an optional
+take a
 .Ar file
 parameter,
 which should be separated from the function or flag by whitespace.

3.  Correct the description of the `y' command: a backslash only escapes
specific characters.

Index: sed.1
===
RCS file: /cvs/src/usr.bin/sed/sed.1,v
retrieving revision 1.50
diff -u -p -r1.50 sed.1
--- sed.1   19 Jul 2017 21:28:19 -  1.50
+++ sed.1   17 Nov 2017 02:50:36 -
@@ -486,10 +483,14 @@ Within
 .Ar string1
 and
 .Ar string2 ,
-a backslash followed by any character other than a newline is that literal
-character, and a backslash followed by an
+a backslash followed by another backslash
+is replaced by a single backslash,
+a backslash followed by an
 .Sq n
-is replaced by a newline character.
+is replaced by a newline character,
+and a backslash followed by the delimiting character

Re: dc(1): minor cleanup

2017-11-28 Thread kshe
On Tue, 28 Nov 2017 08:08:24 +, Otto Moerbeek wrote:
> On Sun, Nov 26, 2017 at 07:25:46PM +0000, kshe wrote:
> > Hi,
> >
> > The diff below encompasses three unrelated minor changes.
> >
> > 1.  Merge the not_equal(), not_less() and not_greater() functions into
> > their caller; these functions cannot be called from the jump table, so
> > it is confusing to define them as if they could.
> >
> > 2.  Make warnings consistent by using warnx(3) everywhere.
> >
> > 3.  Add a missing parenthesis in a comment.
>
> I committed this; but you diff does not apply, I had to fix it. Looks
> like you edited line numbers manually or something like that.

This is actually the opposite: I extracted this diff from a larger one
without editing line numbers, as I thought they did not matter much for
contextual diffs.  Apparently, this works well as long as two unrelated
changes are more than six lines apart; otherwise, well, they appear in
the same window and things get more complicated.

By the way, while fixing my broken patch, it seems that you forgot to
remove the newline at the end of the argument to warnx() in
not_compare().

Regards,

kshe



Re: freezero(3) for stdio's internal buffers

2017-11-27 Thread kshe
On Tue, 28 Nov 2017 05:52:25 +, Theo de Raadt wrote:
> > In fact, can recallocarray be faster than plain free followed by calloc?
>
> Yes.
>
> I think you are missing some nuances.  These added functions have fast paths
> and slow paths.  freezero() isn't just a bzero, it also has munmap()
> sequences.  You are adding forced bzero or munmap() in circumstances
> where previously a malloc implementation could keep the freed'd memory in
> a cache for reuse.

Hence programs willing to avoid leaking secrets to such cache cannot
safely use stdio functions to manipulate those secrets.  But, sure, this
may very well be acceptable as, after all, stdio is merely a library
provided for convenience, and programs needing complete control over
their memory could use their own wrappers around raw system calls
instead.

> I find it hard to read your diff as convert other than "convert all free
> operations to freezero", or always pay the cost no matter what.

I indeed took a systematic approach, so perhaps the bits in vfprintf.c
and vfwprintf.c are in fact overkill, and they could be left out.
However, this does not invalidate the other parts of my first patch,
many of which simply mirror the already established use of recallocarray
on the same buffers.

Regards,

kshe



Re: freezero(3) for stdio's internal buffers

2017-11-27 Thread kshe
On Mon, 27 Nov 2017 08:01:25 +, Otto Moerbeek wrote:
> On Mon, Nov 27, 2017 at 05:48:14AM +0000, kshe wrote:
> > On Mon, 27 Nov 2017 00:42:01 +, Theo de Raadt wrote:
> > > This needs worst-case performance measurements.
> >
> > The only instances where performance could be a problem are in
> > vfprintf.c and vfwprintf.c, where the calls happen inside a loop; but
> > for these, in fact, the best solution would be to use recallocarray
> > directly: rather than repeatedly freeing (and clearing) `convbuf' and
> > reallocating a fresh one every time it is needed, it should be kept
> > around and passed to __wcsconv() and __mbsconv(), along with some
> > accounting of its current size, so that these functions could then call
> > recallocarray appropriately.  However, this optimisation would be more
> > intrusive than the diff I submitted, and would therefore demand greater
> > familiarity with stdio's source, which I have not yet acquired.
> >
> > That being said, in all other cases, since the way stdio functions fill
> > their buffers is much more costly than simply writing a bunch of zeros
> > in sequence (or merely calling munmap(2)), no real slowdown is to be
> > expected.  I should also point out that recallocarray is currently used
> > inside several loops in fvwrite.c, fgetln.c and getdelim.c, but no one
> > complained that the affected functions became too slow, because doing
> > things, as stdio does, like reading from and writing to disk or decoding
> > and converting user input will always dominate the effect of a few calls
> > to discard temporary buffers.
> >
> > Regards,
> >
> > kshe
>
> I was thinking: only point against your statement is that there could
> be cases where a very large buffer is used, but only a small part of
> it is used. In that case, clearing the buffer is more costly. Luckily,
> in those cases malloc just calls munmap(2) without clearing.
>
> Still, I would like to see benchmarks to validate assumptions.

Well, I looked under regress/lib/libc/, but nothing here seems to test
for pathological instances where performance is known to be an issue.  I
would happily run benchmarks if someone could provide examples of such
cases.

In the mean time, here is a better diff for vfprintf.c and vfwprintf.c,
which should eliminate all potential performance regressions,
implementing the optimisation described in my previous message.  In
fact, can recallocarray be faster than plain free followed by calloc?
If so, in the case of vfwprintf.c, this patch may actually improve
performance.  At least, it shall not degrade it.

Nevertheless, this needs to be reviewed carefully since, as stated
above, I do not have much familiarity with stdio's intricate code.

Index: vfprintf.c
===
RCS file: /cvs/src/lib/libc/stdio/vfprintf.c,v
retrieving revision 1.77
diff -u -p -r1.77 vfprintf.c
--- vfprintf.c  29 Aug 2016 12:20:57 -  1.77
+++ vfprintf.c  28 Nov 2017 04:06:43 -
@@ -153,7 +153,7 @@ __sbprintf(FILE *fp, const char *fmt, va
  * string is null-terminated.
  */
 static char *
-__wcsconv(wchar_t *wcsarg, int prec)
+__wcsconv(wchar_t *wcsarg, int prec, char *oconvbuf, size_t *convbuf_size)
 {
mbstate_t mbs;
char buf[MB_LEN_MAX];
@@ -191,18 +191,19 @@ __wcsconv(wchar_t *wcsarg, int prec)
return (NULL);
}
}
-   if ((convbuf = malloc(nbytes + 1)) == NULL)
+   convbuf = recallocarray(oconvbuf, *convbuf_size, nbytes + 1, 1);
+   if (convbuf == NULL)
return (NULL);
+   *convbuf_size = nbytes + 1;
 
/* Fill the output buffer. */
p = wcsarg;
memset(, 0, sizeof(mbs));
-   if ((nbytes = wcsrtombs(convbuf, (const wchar_t **),
-   nbytes, )) == (size_t)-1) {
-   free(convbuf);
+   nbytes = wcsrtombs(convbuf, (const wchar_t **), nbytes, );
+   if (nbytes == (size_t)-1) {
+   freezero(convbuf, *convbuf_size);
return (NULL);
}
-   convbuf[nbytes] = '\0';
return (convbuf);
 }
 #endif
@@ -330,6 +331,7 @@ __vfprintf(FILE *fp, const char *fmt0, _
va_list orgap;  /* original argument pointer */
 #ifdef PRINTF_WIDE_CHAR
char *convbuf;  /* buffer for wide to multi-byte conversion */
+   size_t convbuf_size;
 #endif
 
/*
@@ -857,8 +859,6 @@ fp_common:
if (flags & LONGINT) {
wchar_t *wcp;
 
-   free(convbuf);
-   convbuf = NULL;
if ((wcp = GETARG(wchar_t *)) == NULL) {
struct syslog_data sdata = 
SY

Re: freezero(3) for stdio's internal buffers

2017-11-26 Thread kshe
On Mon, 27 Nov 2017 00:42:01 +, Theo de Raadt wrote:
> This needs worst-case performance measurements.

The only instances where performance could be a problem are in
vfprintf.c and vfwprintf.c, where the calls happen inside a loop; but
for these, in fact, the best solution would be to use recallocarray
directly: rather than repeatedly freeing (and clearing) `convbuf' and
reallocating a fresh one every time it is needed, it should be kept
around and passed to __wcsconv() and __mbsconv(), along with some
accounting of its current size, so that these functions could then call
recallocarray appropriately.  However, this optimisation would be more
intrusive than the diff I submitted, and would therefore demand greater
familiarity with stdio's source, which I have not yet acquired.

That being said, in all other cases, since the way stdio functions fill
their buffers is much more costly than simply writing a bunch of zeros
in sequence (or merely calling munmap(2)), no real slowdown is to be
expected.  I should also point out that recallocarray is currently used
inside several loops in fvwrite.c, fgetln.c and getdelim.c, but no one
complained that the affected functions became too slow, because doing
things, as stdio does, like reading from and writing to disk or decoding
and converting user input will always dominate the effect of a few calls
to discard temporary buffers.

Regards,

kshe



Re: freezero(3) for stdio's internal buffers

2017-11-26 Thread kshe
On Sun, 26 Nov 2017 19:56:03 +, kshe wrote:
> Hi,
>
> Shortly after recallocarray(3) was introduced, it was put into use for
> several objects internal to the stdio library "to avoid leaving detritus
> in memory when resizing buffers".  However, in the end, this memory is
> still released by plain free(3) calls.
>
> The same reason that motivated the change to recallocarray(3) should
> also entail that to freezero(3), where applicable.  Currently, a
> suitable overflow from a malloc(3)ed buffer could allow an attacker to
> retrieve lines previously read by fgetln(3), as well as data previously
> written using the fprintf(3) family of functions, even long after the
> corresponding streams have been closed and even if the programmer was
> very careful explicitly to discard all the manually allocated objects
> that could have contained sensitive data.  The diff below makes such
> attacks much less likely to succeed, so that one can read and write
> secrets to files with stdio functions more safely, id est without the
> undesirable side effect of leaving parts of those secrets in heap memory
> afterwards.

The diff previously attached was one of my early drafts where I only
marked the relevant calls.  Please ignore it.  Here is the real patch I
intended to send.

Index: asprintf.c
===
RCS file: /cvs/src/lib/libc/stdio/asprintf.c,v
retrieving revision 1.25
diff -u -p -r1.25 asprintf.c
--- asprintf.c  17 Mar 2017 14:53:08 -  1.25
+++ asprintf.c  26 Oct 2017 23:30:57 -
@@ -61,7 +61,7 @@ asprintf(char **str, const char *fmt, ..
return (ret);
 
 err:
-   free(f._bf._base);
+   freezero(f._bf._base, f._bf._size);
f._bf._base = NULL;
*str = NULL;
errno = ENOMEM;
Index: fclose.c
===
RCS file: /cvs/src/lib/libc/stdio/fclose.c,v
retrieving revision 1.10
diff -u -p -r1.10 fclose.c
--- fclose.c31 Aug 2015 02:53:57 -  1.10
+++ fclose.c26 Oct 2017 23:30:57 -
@@ -51,7 +51,7 @@ fclose(FILE *fp)
if (fp->_close != NULL && (*fp->_close)(fp->_cookie) < 0)
r = EOF;
if (fp->_flags & __SMBF)
-   free((char *)fp->_bf._base);
+   freezero(fp->_bf._base, fp->_bf._size);
if (HASUB(fp))
FREEUB(fp);
if (HASLB(fp))
Index: fmemopen.c
===
RCS file: /cvs/src/lib/libc/stdio/fmemopen.c,v
retrieving revision 1.3
diff -u -p -r1.3 fmemopen.c
--- fmemopen.c  31 Aug 2015 02:53:57 -  1.3
+++ fmemopen.c  26 Oct 2017 23:30:57 -
@@ -107,7 +107,7 @@ fmemopen_close_free(void *v)
 {
struct state*st = v;
 
-   free(st->string);
+   freezero(st->string, st->size);
free(st);
 
return (0);
Index: freopen.c
===
RCS file: /cvs/src/lib/libc/stdio/freopen.c,v
retrieving revision 1.16
diff -u -p -r1.16 freopen.c
--- freopen.c   21 Sep 2016 04:38:56 -  1.16
+++ freopen.c   26 Oct 2017 23:30:57 -
@@ -106,7 +106,7 @@ freopen(const char *file, const char *mo
if (isopen && f != wantfd)
(void) (*fp->_close)(fp->_cookie);
if (fp->_flags & __SMBF)
-   free((char *)fp->_bf._base);
+   freezero(fp->_bf._base, fp->_bf._size);
fp->_w = 0;
fp->_r = 0;
fp->_p = NULL;
Index: local.h
===
RCS file: /cvs/src/lib/libc/stdio/local.h,v
retrieving revision 1.25
diff -u -p -r1.25 local.h
--- local.h 23 May 2016 00:21:48 -  1.25
+++ local.h 26 Oct 2017 23:30:57 -
@@ -82,7 +82,7 @@ __END_HIDDEN_DECLS
 #defineHASUB(fp) (_UB(fp)._base != NULL)
 #defineFREEUB(fp) { \
if (_UB(fp)._base != (fp)->_ubuf) \
-   free(_UB(fp)._base); \
+   freezero(_UB(fp)._base, _UB(fp)._size); \
_UB(fp)._base = NULL; \
 }
 
@@ -91,7 +91,7 @@ __END_HIDDEN_DECLS
  */
 #defineHASLB(fp) ((fp)->_lb._base != NULL)
 #defineFREELB(fp) { \
-   free((char *)(fp)->_lb._base); \
+   freezero((fp)->_lb._base, (fp)->_lb._size); \
(fp)->_lb._base = NULL; \
 }
 
Index: setvbuf.c
===
RCS file: /cvs/src/lib/libc/stdio/setvbuf.c,v
retrieving revision 1.14
diff -u -p -r1.14 setvbuf.c
--- setvbuf.c   21 Sep 2016 04:38:56 -  1.14
+++ setvbuf.c   26 Oct 2017 23:30:57 -
@@ -70,7 +70,7 @@ setvbuf(FILE *fp, char *buf, int mode, s
fp->_r = fp->_lbfsize = 0;
flags = fp->_flags;
if (flags & __SMBF)
-   free(fp->_bf._base);
+  

dc(1): smaller jump table

2017-11-26 Thread kshe
Hi,

The jump table used in dc/bcode.c is technically off by one, which leads
to the following harmless inconsistency:

$ printf '\376' | dc
dc: \xfe (0376) is unimplemented

$ printf '\377' | dc
dc: internal error: opcode 255

This could be fixed by making it hold 256 entries instead of 255;
however, since all commands are ASCII characters, it actually could
instead be reduced to 128 entries, with a small adjustment of the
relevant check in the eval() function.

Index: bcode.c
===
RCS file: /cvs/src/usr.bin/dc/bcode.c,v
retrieving revision 1.51
diff -u -p -r1.51 bcode.c
--- bcode.c 26 Feb 2017 11:29:55 -  1.51
+++ bcode.c 17 Nov 2017 02:38:12 -
@@ -137,7 +133,7 @@ struct jump_entry {
opcode_function f;
 };
 
-static opcode_function jump_table[UCHAR_MAX];
+static opcode_function jump_table[128];
 
 static const struct jump_entry jump_table_data[] = {
{ ' ',  nop },
@@ -238,7 +234,7 @@ init_bmachine(bool extended_registers)
if (bmachine.reg == NULL)
err(1, NULL);
 
-   for (i = 0; i < UCHAR_MAX; i++)
+   for (i = 0; i < 128; i++)
jump_table[i] = unknown;
for (i = 0; i < JUMP_TABLE_DATA_SIZE; i++)
jump_table[jump_table_data[i].ch] = jump_table_data[i].f;
@@ -1746,10 +1716,10 @@ eval(void)
(void)fprintf(stderr, "%zd =>\n", bmachine.readsp);
 #endif
 
-   if (0 <= ch && ch < UCHAR_MAX)
+   if (0 <= ch && ch < 128)
(*jump_table[ch])();
else
-   warnx("internal error: opcode %d", ch);
+   unknown();
 
 #ifdef DEBUGGING
stack_print(stderr, , "* ",

Regards,

kshe



dc(1): simplify print_ascii()

2017-11-26 Thread kshe
Hi,

The sign of a BIGNUM is irrelevant when inspecting its bit
representation.  Also, the construct

if (BN_is_negative(v))
BN_set_negative(v, 0);

was already redundant since a mere

BN_set_negative(v, 0);

would have been enough to do the same thing.  In any case, none of this
is necessary.

Index: inout.c
===
RCS file: /cvs/src/usr.bin/dc/inout.c,v
retrieving revision 1.20
diff -u -p -r1.20 inout.c
--- inout.c 26 Feb 2017 11:29:55 -  1.20
+++ inout.c 26 Oct 2017 04:44:01 -
@@ -390,22 +390,14 @@ print_value(FILE *f, const struct value
 void
 print_ascii(FILE *f, const struct number *n)
 {
-   BIGNUM *v;
int numbits, i, ch;
 
-   v = BN_dup(n->number);
-   bn_checkp(v);
-
-   if (BN_is_negative(v))
-   BN_set_negative(v, 0);
-
-   numbits = BN_num_bytes(v) * 8;
+   numbits = BN_num_bytes(n->number) * 8;
while (numbits > 0) {
ch = 0;
for (i = 0; i < 8; i++)
-   ch |= BN_is_bit_set(v, numbits-i-1) << (7 - i);
+   ch |= BN_is_bit_set(n->number, numbits-i-1) << (7 - i);
(void)putc(ch, f);
numbits -= 8;
}
-   BN_free(v);
 }

Regards,

kshe



dc(1): earlier pledge

2017-11-26 Thread kshe
Hi,

There is no need to keep "rpath" when executing scripts given as
arguments to the `-e' option if no additional file was supplied.

Index: dc.c
===
RCS file: /cvs/src/usr.bin/dc/dc.c,v
retrieving revision 1.18
diff -u -p -r1.18 dc.c
--- dc.c17 Jul 2016 17:30:47 -  1.18
+++ dc.c26 Oct 2017 04:44:01 -
@@ -72,6 +72,9 @@ dc_main(int argc, char *argv[])
argc -= optind;
argv += optind;
 
+   if (argc == 0 && pledge("stdio", NULL) == -1)
+   err(1, "pledge");
+
init_bmachine(extended_regs);
(void)setvbuf(stdout, NULL, _IOLBF, 0);
(void)setvbuf(stderr, NULL, _IOLBF, 0);
@@ -108,9 +111,6 @@ dc_main(int argc, char *argv[])
 */
 return (0);
}
-
-   if (pledge("stdio", NULL) == -1)
-   err(1, "pledge");
 
src_setstream(, stdin);
reset_bmachine();

Regards,

kshe



freezero(3) for stdio's internal buffers

2017-11-26 Thread kshe
 ((mlen = wcsrtombs(mbstr, , len, )) == (size_t)-1) {
-   free(mbstr);
+   freezero(mbstr);
return (EOF);
}
if (mlen == len)
@@ -82,7 +82,7 @@ vswscanf(const wchar_t * __restrict str,
f._read = eofread;
    f._lb._base = NULL;
r = __vfwscanf(, fmt, ap);
-   free(mbstr);
+   freezero(mbstr);
 
return (r);
 }

Regards,

kshe



dc(1) mishandles fractional input in bases other than 10

2017-11-26 Thread kshe
Hi,

The following behaviour seems unacceptable to me.

$ dc -e '16dio .C .C0 f'
.C0
.B

$ dc -e '8dio .3 .30 .300 f'
.3000
.275
.23

This bug affects all bases other than 10 and increasing the scale with
the `k' command does not prevent it.  The only reliable way to input a
rational number in these bases is therefore to input an integer first,
and then divide it by the appropriate factor; alternatively, as
illustrated above, adding meaningless zeros to the digit expansion can
help, although I have not verified that this works in all cases nor how
many zeros would need to be added to guarantee the expected result
every time.

I could try to provide a patch for this, but I dislike how the current
code gives special privileges to base 10 at the expense of all the
others, even though it is supposed to accept input in any base; so my
patch would probably be deemed too invasive as I would want to change
too many things at once, including in areas irrelevant to this
particular issue.

If a less intrusive workaround is not possible, then perhaps at least a
warning about fractional input being essentially broken in bases other
than 10 could be added to the manual.

Regards,

kshe



sed.1: miscellaneous corrections

2017-11-26 Thread kshe
s not a terminal.
 .El
 .Sh EXIT STATUS
 .Ex -std sed
@@ -538,20 +539,15 @@ $ sed -n '
 /./ {
 p
 d
-}
+}
 # Write a single empty line, then look for more empty lines.
-/^$/p
-# Get the next line, discard the held  (empty line),
-# and look for more empty lines.
+p
 :Empty
-/^$/{
-N
-s/.//
-b Empty
-}
+n
+/^$/b Empty
 # Write the non-empty line before going back to search
 # for the first in a set of empty lines.
-p
+p
 \&'
 .Ed
 .Sh SEE ALSO
@@ -569,12 +565,6 @@ specification.
 The flags
 .Op Fl aEiru
 are extensions to that specification.
-.Pp
-The use of newlines to separate multiple commands on the command line
-is non-portable;
-the use of newlines to separate multiple commands within a command file
-.Pq Fl f Ar command_file
-is portable.
 .Sh HISTORY
 A
 .Nm
@@ -583,8 +573,6 @@ command appeared in
 .Sh CAVEATS
 The use of semicolons to separate multiple commands
 is not permitted for the following commands:
-.Ic a , b , c ,
-.Ic i , r , t ,
-.Ic w , \&: ,
+.Ic a , c , i , r , w
 and
 .Ic # .

Regards,

kshe



dc(1): dead store

2017-11-26 Thread kshe
Hi,

This assignment is useless.

Index: bcode.c
===
RCS file: /cvs/src/usr.bin/dc/bcode.c,v
retrieving revision 1.51
diff -u -p -r1.51 bcode.c
--- bcode.c 26 Feb 2017 11:29:55 -  1.51
+++ bcode.c 26 Oct 2017 04:44:01 -
@@ -1630,7 +1611,7 @@ skip_until_mark(void)
free(read_string([bmachine.readsp]));
break;
case '!':
-   switch (ch = readch()) {
+   switch (readch()) {
case '<':
case '>':
case '=':

The above diff does not cause any change in the optimised executable
output, because such removal was obviously already performed by the
compiler.

Alternatively, while here, this function could be slightly shortened as
follows.

Index: bcode.c
===
RCS file: /cvs/src/usr.bin/dc/bcode.c,v
retrieving revision 1.51
diff -u -p -r1.51 bcode.c
--- bcode.c 26 Feb 2017 11:29:55 -  1.51
+++ bcode.c 17 Nov 2017 02:38:12 -
@@ -1610,7 +1590,13 @@ skip_until_mark(void)
return;
case EOF:
errx(1, "mark not found");
-   return;
+   case '!':
+   ch = readch();
+   if (ch != '<' && ch != '>' && ch != '=') {
+   free(readline());
+   break;
+   }
+   /* fall through */
case 'l':
case 'L':
case 's':
@@ -1629,22 +1615,6 @@ skip_until_mark(void)
case '[':
free(read_string([bmachine.readsp]));
break;
-   case '!':
-   switch (ch = readch()) {
-   case '<':
-   case '>':
-   case '=':
-   (void)readreg();
-   if (readch() == 'e')
-   (void)readreg();
-   else
-   unreadch();
-   break;
-   default:
-   free(readline());
-   break;
-   }
-   break;
default:
break;
}

Regards,

kshe



dc(1): minor cleanup

2017-11-26 Thread kshe
Hi,

The diff below encompasses three unrelated minor changes.

1.  Merge the not_equal(), not_less() and not_greater() functions into
their caller; these functions cannot be called from the jump table, so
it is confusing to define them as if they could.

2.  Make warnings consistent by using warnx(3) everywhere.

3.  Add a missing parenthesis in a comment.

Index: bcode.c
===
RCS file: /cvs/src/usr.bin/dc/bcode.c,v
retrieving revision 1.51
diff -u -p -r1.51 bcode.c
--- bcode.c 26 Feb 2017 11:29:55 -  1.51
+++ bcode.c 17 Nov 2017 02:38:12 -
@@ -95,18 +95,14 @@ static void bdiv(void);
 static voidless_numbers(void);
 static voidlesseq_numbers(void);
 static voidequal(void);
-static voidnot_equal(void);
 static voidless(void);
-static voidnot_less(void);
 static voidgreater(void);
-static voidnot_greater(void);
 static voidnot_compare(void);
 static boolcompare_numbers(enum bcode_compare, struct number *,
struct number *);
@@ -1195,7 +1207,7 @@ bexp(void)
negate(p);
rscale = bmachine.scale;
} else {
-   /* Posix bc says min(a.scale * b, max(a.scale, scale) */
+   /* Posix bc says min(a.scale * b, max(a.scale, scale)) */
u_long  b;
u_int   m;
 
@@ -1402,12 +1400,6 @@ lesseq_numbers(void)
 }
 
 static void
-not_equal(void)
-{
-   compare(BCODE_NOT_EQUAL);
-}
-
-static void
 less(void)
 {
compare(BCODE_LESS);
@@ -1418,39 +1410,27 @@ not_compare(void)
 {
switch (readch()) {
case '<':
-   not_less();
+   compare(BCODE_NOT_LESS);
break;
case '>':
-   not_greater();
+   compare(BCODE_NOT_GREATER);
break;
case '=':
-   not_equal();
+   compare(BCODE_NOT_EQUAL);
break;
default:
unreadch();
-   (void)fprintf(stderr, "! command is deprecated\n");
+   warnx("! command is deprecated");
break;
}
 }
 
 static void
-not_less(void)
-{
-   compare(BCODE_NOT_LESS);
-}
-
-static void
 greater(void)
 {
compare(BCODE_GREATER);
 }
 
-static void
-not_greater(void)
-{
-   compare(BCODE_NOT_GREATER);
-}
-
 static bool
 compare_numbers(enum bcode_compare type, struct number *a, struct number *b)
 {

Regards,

kshe



dc(1): floor(log_10(x)) should not cost more than O(log(log(x)))

2017-11-26 Thread kshe
(p, d));
+   bn_check(BN_exp(a, a, p, ctx));
+
+   if (BN_ucmp(int_part, a) >= 0)
+   d++;
+
+   BN_CTX_free(ctx);
+   BN_free(a);
+   BN_free(p);
+   } else
+   d++;
+
+   BN_free(int_part);
+
+   return d + n->scale;
 }
 
 static void

Regards,

kshe



Re: sed(1): missing NUL in pattern space

2017-07-01 Thread kshe
d'
text
a

$ echo | sed 'c\
> text
> s/^/a/;s/^//
> s/a/b/g
> s/^//p;d'
text
b

Because, of course, thanks to that `pd' magic, `y' is only effective
after `c' when an odd number of substitutions have affected the pattern
space.  Although it is true that the above two bugs are primarily due to
the incorrect handling of the `deleted' flag by the `s' command, they
nonetheless help to expose further the inconsistency of the code, making
its overall fragility even more obvious than it would have otherwise
been; in this case in particular, they allow one to notice that:

1.  Even though `n' resets the `deleted' flag, `N' does not.
2.  Even though `s' ignores the `deleted' flag, `y' does not.

And so, similarily, even if `p', `P', and `w' check for it, `l' ignores
`pd' completely; and, indeed, so does `c', in contrast with other
commands such as `D'.

I hope that these examples make it clear that the problem is much worse
than what the above diff actually solves.  Of course, I could have
written a third detailed report appropriately addressing all of this,
but I figured no reasonable amount of patching endeavour could ever
truly and satisfyingly improve the overall condition of such inherently
flawed code.  Therefore, I no longer believe any of my patches (nor any
other patches, really) to be relevant to OpenBSD's current sed.

This might sound a little extreme, but, in fact, as my previous messages
illustrate, that `pd' madness is far from being the only incongruity
lurking within sed's source.  I should also add that, since then, I
actually found more bugs, some of them even affecting the parsing
algorithm...  To take only one example out of many, the implementation
of the `y' command is incorrect:

$ sed 'y/[/]/'
sed: 1: "y/[/]/": unterminated transform target string

(The parser thinks that an opening bracket in an argument to the `y'
command is meant to be the start of a character class, as if it were
part of a regular expression.)

Considering that sed is my favourite Unix utility (dc comes close; vi
and sh don't count as they are more than mere utilities), I would be
very sad if all my beautiful sed scripts were forever to be handled by
such a mediocre implementation, since I don't really plan to ever use
anything else than OpenBSD.  Also, in light of all these observations,
instead of devising endless diffs for a piece of software that clearly
doesn't deserve any, I am willing to rewrite it from scratch.  This
might take a few weeks as I do not have enough time to work on it
regularly, but there is apparently no hurry: after all, I seem to be the
only one out there to actually need such features as empty match
replacement to work reliably, no matter if the last command was `D' or
if there happens to be an embed NUL or newline in the pattern space.

Besides, and perhaps even more importantly, a clean rewrite could yield
many other improvements (like a more restrictive pledge(2) depending on
the outcome of the scripts' compilation), without the hassle of having
to untangle such old and messy code.  As a consequence, especially since
no one at OpenBSD appears to have time to deal with all the deficiencies
of the existing source, it might be simpler to wait until I am done with
my own, as my code shall be much more readable than the current one, and
therefore quite easy to review.

Kind regards,

kshe



Re: sed(1): missing NUL in pattern space

2017-06-13 Thread kshe
- le)
+   if (match[0].rm_eo != le)
cspace(, s, match[0].rm_eo - le,
APPEND);
n--;
@@ -418,7 +416,6 @@ substitute(struct s_command *cp)
PS = SS;
psanl = tspace.append_newline;
SS = tspace;
-   SS.space = SS.back;
 
/* Handle the 'p' flag. */
if (cp->u.s->p)
@@ -547,13 +544,14 @@ regexec_e(regex_t *preg, const char *string, int eflag
 static void
 regsub(SPACE *sp, char *string, char *src)
 {
-   int len, no;
+   int no;
+   size_t len;
char c, *dst;
 
-#defineNEEDSP(reqlen)  
\
+#define NEEDSP(reqlen) \
if (sp->len + (reqlen) + 1 >= sp->blen) {   \
size_t newlen = sp->blen + (reqlen) + 1024; \
-   sp->space = sp->back = xrealloc(sp->back, newlen);  \
+   sp->space = xrealloc(sp->space, newlen);\
sp->blen = newlen;  \
dst = sp->space + sp->len;  \
}
@@ -572,8 +570,7 @@ regsub(SPACE *sp, char *string, char *src)
NEEDSP(1);
*dst++ = c;
++sp->len;
-   } else if (match[no].rm_so != -1 && match[no].rm_eo != -1) {
-   len = match[no].rm_eo - match[no].rm_so;
+   } else if ((len = match[no].rm_eo - match[no].rm_so) != 0) {
NEEDSP(len);
memmove(dst, string + match[no].rm_so, len);
dst += len;
@@ -594,18 +591,18 @@ cspace(SPACE *sp, const char *p, size_t len, enum e_sp
 {
size_t tlen;
 
+   if (spflag == REPLACE)
+   sp->len = 0;
+
/* Make sure SPACE has enough memory and ramp up quickly. */
tlen = sp->len + len + 1;
if (tlen > sp->blen) {
size_t newlen = tlen + 1024;
-   sp->space = sp->back = xrealloc(sp->back, newlen);
+   sp->space = xrealloc(sp->space, newlen);
sp->blen = newlen;
}
 
-   if (spflag == REPLACE)
-   sp->len = 0;
-
-   memmove(sp->space + sp->len, p, len);
+   memcpy(sp->space + sp->len, p, len);
 
sp->space[sp->len += len] = '\0';
 }

Here is the corresponding list of changes:

1.  Remove the `back' field from the SPACE structure, which was useless
since always equal to the `space' field.

2.  Use APPEND in calls to cspace() that previously used 0 directly,
which could be slightly confusing if one did not know the order of the
`e_spflag' enum, and also missed the point of it being defined in the
first place.

3.  As suggested by the outdated comment preceding its definition,
cspace() was originally only meant to append to, not to replace, the
contents of a SPACE.  The calculation of the size for the potentially
new buffer was however never adjusted, which meant that, with input
lines of approximately equal length, the function would allocate roughly
twice as much memory as necessary, half of that memory staying unused.
To fix it, conditionally set `sp->len' to zero before the relevent
assignment, not after.

4.  Since cspace() never receives arguments that overlap in memory (as
that would make little sense), and considering that it is probably one
of the most frequently called routines, make it use memcpy(3) in lieu of
memmove(3).

5.  Change conditionals of the form (a - b) to (a != b) for better
readability.

6.  Use `size_t' instead of `int' for a variable holding a length.

7.  Slightly improve the logic of regsub() to avoid a bunch of no-op
instructions when `len' is zero.

8.  Move a goto label previously redirecting into a conditional that was
always false within that context.

9.  Add missing tabs to align backslashes in macro definitions.

Kind regards,

kshe



sed(1): missing NUL in pattern space

2017-06-09 Thread kshe
atch could
instead try to unify the whole source to follow one convention
consistently; however, that would probably take as much time as a full
rewrite, because that is essentially what would need to be done, so, for
now, I kept it as simple as it could be, to avoid messing it up even
more than it already is.

--- a/src/usr.bin/sed/process.c Wed Feb 22 14:09:09 2017
+++ b/src/usr.bin/sed/process.c Wed Jun  7 09:56:20 2017
@@ -121,7 +121,7 @@ redirect:
goto redirect;
case 'c':
pd = 1;
-   psl = 0;
+   ps[psl = 0] = '\0';
if (cp->a2 == NULL || lastaddr || lastline())
(void)fprintf(outfile, "%s", cp->t);
break;
@@ -138,6 +138,7 @@ redirect:
} else {
psl -= (p + 1) - ps;
memmove(ps, p + 1, psl);
+   ps[psl] = '\0';
goto top;
}
case 'g':

Also note how one could have thought of fixing `D' by making its
memmove(3) go up to `psl + 1' instead of `psl', since there now should
always be a NUL at the end; however, a separate assignment makes the
intent clearer, and I read somewhere that explicit was better than
implicit.

That being said, the more I think about this, the more I am convinced
that the problematic commands were in fact `l' and `s'.  Since the code
already uses fgetln(3) to read lines, it would have been nicer (and
optimal) to never rely on NULs.  As stated above, this is already the
case almost everywhere; therefore, if any of the OpenBSD developers
wanted to elaborate on that idea after committing such an unsatisfying
patch, I would of course be very happy to help.

Kind regards,

kshe