On Thu, 29 Aug 2013, Sergey Kandaurov wrote:

On 22 August 2013 14:51, Bruce Evans <b...@optusnet.com.au> wrote:
expand_number() remains a very badly designed and implemented function.
Its design errors start with its name.  It doesn't expand numbers.  It
...
[...]

Some of the other bugs in the old version:

% /*
%  * Convert an expression of the following forms to a uint64_t.
%  *    1) A positive decimal number.
%  *    2) A positive decimal number followed by a 'b' or 'B' (mult by 1).
%  ...
This garbage was copied from a similar comment in dd/args.c.  The number is
actually neither positive nor decimal, especially in dd where there is a
variant of this function that returns negative numbers.  Another design
error
in this function is that it can't return negative numbers.

Well, I see no useful contents in this comment. It is obvious and verbose.
Probably it should be trimmed.

Stating the format in the man page is enough.

I don't like the man page much either, and have complained about it and
dehumanize_number(3) at length elsewhere.  But expand_number() is much
simpler, so most of the bugs mostly affect the other man page.  (It is
too simple to be able to translate numbers printed by dehumanize_number(),
since it doesn't support power of 10 "prefixes".)  Looking at
expand_number(3) now, I see only the following bugs:
- the format of the string isn't really documented
- poor grammar:
  - "the buf string"
  - "the address pointed out"
- confusing terminology "prefix".  This means an "SI power of 2" prefix,
  e.g., "kilo" or "k".  (I think SI only really has power if 10 prefixes.
  Anyway, "k" should always mean 1000 and "K" always 1024.  This is
  backwards in the man page.)  "kilo" really is a prefix when it modifies
  a unit, "e.g.", kilobytes.  But this function deals with pure numbers,
  so there are no units, and the "prefixes" are always uses as suffixes,
  e.g., "10K" (sic).
- the code supports both cases for all "prefixes", with literal lower
  case, but the man page only documents one case (always upper case,
  except for "k" where this is backwards).  Comments in the code
  describes the "prefixes" as "units".  That is better, but is too
  confusing since there may also be units like bytes added later.
- the man page describes the multipliers in non-dehumanized form, i.e.,
  as decimal numbers.  Even I prefer a dehumanized form here, since I
  can only recognize small decimal numbers as powers of 2.  The man page
  should give the numbers in hex or powers of 2 first and then in decimal
  if there is space.
- in the ERRORS section:
  - the description of EINVAL is incomplete.  See strtoumax() for more.
  - the description of ERANGE has poor grammar.  It should start with
    "The", as os normal and as is done for EINVAL, and perhaps give
    more details, as in strto*().

I think most utilities and all library functions that parse numbers should
say when they use strto*() to do this.  Utilities should mostly be
covered by intro(1).  The default should be to accept numbers in any
base.  POSIX unfortunately doesn't require very much, so bases other than
10 are unportable at best. Grepping in utitilies man pages shows the following:
- I patched mknod(8) with the following when I fixed its strto*() parsing:

% Index: mknod.8
%  ===================================================================
%  RCS file: /home/ncvs/src/sbin/mknod/mknod.8,v
%  retrieving revision 1.3
% retrieving revision 1.4
% diff -r1.3 -r1.4
% 94,95c94,96
% < Major and minor device numbers can be given using the normal C
% < notation, so that a leading
% ---
% > Major and minor device numbers can be given in any format acceptable to
% > .Xr strtoul 3 ,
% > so that a leading

This is still fuzzy since it doesn't say that any base is allowed.

- sleep(1) refers to strtod(3).  sleep(1) actually uses sscanf() instead
  of strtod(3), but I think the reference is correct.
- ctladm(8) refers to strtoull(3) for lba's, with details about the base
  (but expressed too verbosely, and duplicated for read and write).  dd
  needs a similar amount of detail and more, but has less.  ctladm(3)
  does actually use strtoull(), but that is another bug since on 32-bit
  machines strtoull() only gives 32-bit numbers but lba's need
  more.
- etherswitchcfg(8) refers to strtol(4) for phy's and gives details about
  accepting any base.  strtol(4) of course doesn't exist.  It is a typo
  for strtol(3).



%  */
%

This bogus blank line detaches the comment from the code that it describes,
resulting in the comment describing null code.

Sorry, I didn't see such blank line in head.

Oops, false alarm.

...
Fixing the verbose comment above would result in adding more verboseness
citing from man strtoumax(3) wrt. signedness and base details.
Yet another point to trim it.

Yes.  We add the detail that the number is restricted to 64 bits (else
ERANGE), but already document that well enough (in the man page), and
after these fixes the man page should match the code.

Thanks for your valuable input. What about this change (on top of head)?

Index: expand_number.c
===================================================================
--- expand_number.c    (revision 255017)
+++ expand_number.c    (working copy)
@@ -35,42 +35,29 @@
#include <libutil.h>
#include <stdint.h>

-/*
- * Convert an expression of the following forms to a uint64_t.
- *    1) A positive decimal number.
- *    2) A positive decimal number followed by a 'b' or 'B' (mult by 1).
- *    3) A positive decimal number followed by a 'k' or 'K' (mult by 1 << 10).
- *    4) A positive decimal number followed by a 'm' or 'M' (mult by 1 << 20).
- *    5) A positive decimal number followed by a 'g' or 'G' (mult by 1 << 30).
- *    6) A positive decimal number followed by a 't' or 'T' (mult by 1 << 40).
- *    7) A positive decimal number followed by a 'p' or 'P' (mult by 1 << 50).
- *    8) A positive decimal number followed by a 'e' or 'E' (mult by 1 << 60).
- */
int
expand_number(const char *buf, uint64_t *num)
{
+    char *endptr;
+    uintmax_t umaxval;
    uint64_t number;
-    int saved_errno;
    unsigned shift;
-    char *endptr;
+    int serrno;

-    saved_errno = errno;
+    serrno = errno;
    errno = 0;
-
-    number = strtoumax(buf, &endptr, 0);
-
-    if (number == UINTMAX_MAX && errno == ERANGE) {
-        return (-1);
-    }
-
-    if (errno == 0)
-        errno = saved_errno;
-
+    umaxval = strtoumax(buf, &endptr, 0);
    if (endptr == buf) {
        /* No valid digits. */
        errno = EINVAL;
        return (-1);
    }

I want to remove this if clause (depend on the POSIX extension).

+    if (umaxval > UINT64_MAX)
+        errno = ERANGE;
+    if (errno != 0)
+        return (-1);
+    errno = serrno;
+    number = umaxval;

    switch (tolower((unsigned char)*endptr)) {
    case 'e':

Seems OK.  Tabs got corrupted in the mail.

I complained about missing overflow checking for the shift, but there is
some.  It seems to be correct except for a style bug (a blank line that
separates the check from the final evaluation.  We avoid changing *num
when failing, and don't use a temporary variable for holding the result
of the shift, so the shift expression is written twice, but that is a
negative reason to separate its evaluations).

Bruce
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to