dc(1): floor(log_10(x)) should not cost more than O(log(log(x)))

kshe Sun, 26 Nov 2017 11:40:42 -0800

Hi,

The `Z' command can be a handy shortcut for computing logarithms; as
such, for example, it is the basis of the implementation of bc(1)'s `l'
function.  However, the algorithm currently used in count_digits() is
too naive to be really useful, becoming rapidly much slower than what
would be expected from a native command.


To see how this computation could easily be made exponentially faster,
one may start by noticing that, if next() is the function defined for
any real r as

        next(r) := floor(r) + 1,

then clearly, for any strictly positive integer x,

        floor(log_2(x)) <= log_2(x) < next(log_2(x))

and therefore

        log_10(2) * floor(log_2(x)) <= log_10(x) < k,

where

        k := log_10(2) * next(log_2(x)).

Since log_10(2) < 1, it follows that

        floor(k) <= next(k - log_10(2)) <= next(log_10(x)) <= next(k),

which proves that next(log_10(x)) is either floor(k) or next(k).

If next(log_10(x)) = floor(k), then

        10^floor(k) = 10^next(log_10(x)) > 10^log_10(x) = x.

If next(log_10(x)) = next(k), then

        10^floor(k) = 10^floor(log_10(x)) <= 10^log_10(x) = x.

Therefore, if x >= 10^floor(k), then next(log_10(x)) cannot be floor(k),
hence it must be next(k); likewise, if x < 10^floor(k), then
next(log_10(x)) cannot be next(k), hence it must be floor(k).  Using the
conventional integer value of a boolean expression, this result can be
summarised as

        next(log_10(x)) = floor(k) + (x >= 10^floor(k)).

As an additional refinement, one may further notice that if

        floor(k) = floor(log_10(2) * floor(log_2(x)))

then

        10^floor(k) = 10^floor(log_10(2) * floor(log_2(x)))
                    <= 10^(log_10(2) * floor(log_2(x)))
                    <= 2^floor(log_2(x))
                    <= x

so that it can readily be concluded that

        next(log_10(x)) = next(k)

without having to compute 10^floor(k).

The BN library permits computation of k in O(1) and 10^floor(k) in
O(log(k)) which is O(log(log(x))).  Therefore, one can compute
next(log_10(x)) in O(1) most of the time (at least on average, and with
a certain definition of such average, the full analysis of which is, I
presume, outside the scope of this message), with a worst case of
O(log(log(x))).  In contrast, the current code is exponentially worse
than what its worst case should be, computing this value in O(log(x)).

        $ jot -b 9 -s '' 65536 >script
        $ echo Z >>script

        $ time dc script
            0m03.57s real     0m03.56s user     0m00.01s system
        $ time ./dc script
            0m00.12s real     0m00.12s user     0m00.00s system

The diff below implements this optimisation.  It also fixes a small
logic error in split_number(), which is used by count_digits().

Index: bcode.c
===================================================================
RCS file: /cvs/src/usr.bin/dc/bcode.c,v
retrieving revision 1.51
diff -u -p -r1.51 bcode.c
--- bcode.c     26 Feb 2017 11:29:55 -0000      1.51
+++ bcode.c     17 Nov 2017 02:38:12 -0000
@@ -385,9 +381,10 @@ split_number(const struct number *n, BIG
 
        bn_checkp(BN_copy(i, n->number));
 
-       if (n->scale == 0 && f != NULL)
-               bn_check(BN_set_word(f, 0));
-       else if (n->scale < sizeof(factors)/sizeof(factors[0])) {
+       if (n->scale == 0) {
+               if (f != NULL)
+                       bn_check(BN_set_word(f, 0));
+       } else if (n->scale < sizeof(factors)/sizeof(factors[0])) {
                rem = BN_div_word(i, factors[n->scale]);
                if (f != NULL)
                        bn_check(BN_set_word(f, rem));
@@ -692,25 +689,40 @@ push_scale(void)
 static u_int
 count_digits(const struct number *n)
 {
-       struct number   *int_part, *fract_part;
-       u_int           i;
+       BIGNUM          *int_part, *a, *p;
+       BN_CTX          *ctx;
+       uint64_t        d, c = 0x4D104D42;
+       int             bits;
 
        if (BN_is_zero(n->number))
                return n->scale ? n->scale : 1;
 
-       int_part = new_number();
-       fract_part = new_number();
-       fract_part->scale = n->scale;
-       split_number(n, int_part->number, fract_part->number);
-
-       i = 0;
-       while (!BN_is_zero(int_part->number)) {
-               (void)BN_div_word(int_part->number, 10);
-               i++;
-       }
-       free_number(int_part);
-       free_number(fract_part);
-       return i + n->scale;
+       bn_checkp(int_part = BN_new());
+       split_number(n, int_part, NULL);
+
+       if ((bits = BN_num_bits(int_part)) == 0)
+               d = 0;
+       else if ((d = (c * bits) >> 32) != (c * (bits - 1)) >> 32) {
+               bn_checkp(ctx = BN_CTX_new());
+               bn_checkp(a = BN_new());
+               bn_checkp(p = BN_new());
+
+               bn_check(BN_set_word(a, 10));
+               bn_check(BN_set_word(p, d));
+               bn_check(BN_exp(a, a, p, ctx));
+
+               if (BN_ucmp(int_part, a) >= 0)
+                       d++;
+
+               BN_CTX_free(ctx);
+               BN_free(a);
+               BN_free(p);
+       } else
+               d++;
+
+       BN_free(int_part);
+
+       return d + n->scale;
 }
 
 static void

Regards,

kshe

dc(1): floor(log_10(x)) should not cost more than O(log(log(x)))

Reply via email to