Re: dc(1): floor(log_10(x)) should not cost more than O(log(log(x)))

Theo Buehler Fri, 01 Dec 2017 01:22:27 -0800

On Fri, Dec 01, 2017 at 09:58:55AM +0100, Otto Moerbeek wrote:
> On Thu, Nov 30, 2017 at 01:10:33PM +0000, kshe wrote:
> 
> > On Thu, 30 Nov 2017 07:22:42 +0000, Otto Moerbeek wrote:
> > > On Sun, Nov 26, 2017 at 07:40:03PM +0000, kshe wrote:
> > > > Hi,
> > > >
> > > > The `Z' command can be a handy shortcut for computing logarithms; as
> > > > such, for example, it is the basis of the implementation of bc(1)'s `l'
> > > > function.  However, the algorithm currently used in count_digits() is
> > > > too naive to be really useful, becoming rapidly much slower than what
> > > > would be expected from a native command.
> > > >
> > > > To see how this computation could easily be made exponentially faster,
> > > > one may start by noticing that, if next() is the function defined for
> > > > any real r as
> > > >
> > > >     next(r) := floor(r) + 1,
> > > >
> > > > then clearly, for any strictly positive integer x,
> > > >
> > > >     floor(log_2(x)) <= log_2(x) < next(log_2(x))
> > > >
> > > > and therefore
> > > >
> > > >     log_10(2) * floor(log_2(x)) <= log_10(x) < k,
> > > >
> > > > where
> > > >
> > > >     k := log_10(2) * next(log_2(x)).
> > > >
> > > > Since log_10(2) < 1, it follows that
> > > >
> > > >     floor(k) <= next(k - log_10(2)) <= next(log_10(x)) <= next(k),
> > > >
> > > > which proves that next(log_10(x)) is either floor(k) or next(k).
> > > >
> > > > If next(log_10(x)) = floor(k), then
> > > >
> > > >     10^floor(k) = 10^next(log_10(x)) > 10^log_10(x) = x.
> > > >
> > > > If next(log_10(x)) = next(k), then
> > > >
> > > >     10^floor(k) = 10^floor(log_10(x)) <= 10^log_10(x) = x.
> > > >
> > > > Therefore, if x >= 10^floor(k), then next(log_10(x)) cannot be floor(k),
> > > > hence it must be next(k); likewise, if x < 10^floor(k), then
> > > > next(log_10(x)) cannot be next(k), hence it must be floor(k).  Using the
> > > > conventional integer value of a boolean expression, this result can be
> > > > summarised as
> > > >
> > > >     next(log_10(x)) = floor(k) + (x >= 10^floor(k)).
> > > >
> > > > As an additional refinement, one may further notice that if
> > > >
> > > >     floor(k) = floor(log_10(2) * floor(log_2(x)))
> > > >
> > > > then
> > > >
> > > >     10^floor(k) = 10^floor(log_10(2) * floor(log_2(x)))
> > > >                 <= 10^(log_10(2) * floor(log_2(x)))
> > > >                 <= 2^floor(log_2(x))
> > > >                 <= x
> > > >
> > > > so that it can readily be concluded that
> > > >
> > > >     next(log_10(x)) = next(k)
> > > >
> > > > without having to compute 10^floor(k).
> > > >
> > > > The BN library permits computation of k in O(1) and 10^floor(k) in
> > > > O(log(k)) which is O(log(log(x))).  Therefore, one can compute
> > > > next(log_10(x)) in O(1) most of the time (at least on average, and with
> > > > a certain definition of such average, the full analysis of which is, I
> > > > presume, outside the scope of this message), with a worst case of
> > > > O(log(log(x))).  In contrast, the current code is exponentially worse
> > > > than what its worst case should be, computing this value in O(log(x)).
> > > >
> > > >     $ jot -b 9 -s '' 65536 >script
> > > >     $ echo Z >>script
> > > >
> > > >     $ time dc script
> > > >         0m03.57s real     0m03.56s user     0m00.01s system
> > > >     $ time ./dc script
> > > >         0m00.12s real     0m00.12s user     0m00.00s system
> > > >
> > > > The diff below implements this optimisation.  It also fixes a small
> > > > logic error in split_number(), which is used by count_digits().
> > >
> > > General comment: it would be a good thing to add (part of) the proof
> > > or a reference to some published work to the code in a comment.
> > > Especially the derivation of c and the computation of d seem a bit
> > > like dropping out of thin air.
> > 
> > All the above proof says is that a logarithm in one base is easily
> > convertible to the corresponding one in another base.  Since this has
> > been known for a few centuries already, I see no reason to provide more
> > details here than for the algorithms (and associated constants) featured
> > in /usr/share/misc/bc.library, where no explanations of any kind are to
> > be found either.
> > 
> > > From a style point of view I do not like the assignment in
> > > conditionals very much.
> > 
> > Please feel free to apply your prefered style to this patch.
> > 
> > > There's also the problem of bits * c overflowing, though that's likely
> > > theoretical.
> > 
> > To provoke such overflow, one would first need a platform where ints are
> > larger than 32 bits, and the involved numbers would have to exceed
> > 2^2147483648.  This is indeed unlikely to happen.
> > 
> > Regards,
> > 
> > kshe
> 
> So this is what I plan to commit. Changes wrt your version: make d
> uint, and do not use assignment in conditional test.


This is a very nice optimization and your version patch makes it easier
to follow.

ok

> 
> I also added a regress test for Z (and found out there's a difference
> between gnu dc for 0Z, but that is a different issue).
> 
>       -Otto
> 
> Index: bcode.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/dc/bcode.c,v
> retrieving revision 1.56
> diff -u -p -r1.56 bcode.c
> --- bcode.c   29 Nov 2017 19:13:31 -0000      1.56
> +++ bcode.c   1 Dec 2017 08:54:12 -0000
> @@ -390,9 +390,10 @@ split_number(const struct number *n, BIG
>  
>       bn_checkp(BN_copy(i, n->number));
>  
> -     if (n->scale == 0 && f != NULL)
> -             bn_check(BN_set_word(f, 0));
> -     else if (n->scale < sizeof(factors)/sizeof(factors[0])) {
> +     if (n->scale == 0) {
> +             if (f != NULL)
> +                     bn_check(BN_set_word(f, 0));
> +     } else if (n->scale < sizeof(factors)/sizeof(factors[0])) {
>               rem = BN_div_word(i, factors[n->scale]);
>               if (f != NULL)
>                       bn_check(BN_set_word(f, rem));
> @@ -697,25 +698,56 @@ push_scale(void)
>  static u_int
>  count_digits(const struct number *n)
>  {
> -     struct number   *int_part, *fract_part;
> -     u_int           i;
> +     BIGNUM          *int_part, *a, *p;
> +     BN_CTX          *ctx;
> +     uint            d;
> +     const uint64_t  c = 1292913986; /* floor(2^32 * log_10(2)) */
> +     int             bits;
>  
>       if (BN_is_zero(n->number))
>               return n->scale ? n->scale : 1;
>  
> -     int_part = new_number();
> -     fract_part = new_number();
> -     fract_part->scale = n->scale;
> -     split_number(n, int_part->number, fract_part->number);
> -
> -     i = 0;
> -     while (!BN_is_zero(int_part->number)) {
> -             (void)BN_div_word(int_part->number, 10);
> -             i++;
> +     int_part = BN_new();
> +     bn_checkp(int_part);
> +
> +     split_number(n, int_part, NULL);
> +     bits = BN_num_bits(int_part);
> +
> +     if (bits == 0)
> +             d = 0;
> +     else {
> +             /*
> +              * Estimate number of decimal digits based on number of bits.
> +              * Divide 2^32 factor out by shifting
> +              */
> +             d = (c * bits) >> 32;
> +
> +             /* If close to a possible rounding error fix if needed */
> +             if (d != (c * (bits - 1)) >> 32) {
> +                     ctx = BN_CTX_new();
> +                     bn_checkp(ctx);
> +                     a = BN_new();
> +                     bn_checkp(a);
> +                     p = BN_new();
> +                     bn_checkp(p);
> +
> +                     bn_check(BN_set_word(a, 10));
> +                     bn_check(BN_set_word(p, d));
> +                     bn_check(BN_exp(a, a, p, ctx));
> +
> +                     if (BN_ucmp(int_part, a) >= 0)
> +                             d++;
> +
> +                     BN_CTX_free(ctx);
> +                     BN_free(a);
> +                     BN_free(p);
> +             } else
> +                     d++;
>       }
> -     free_number(int_part);
> -     free_number(fract_part);
> -     return i + n->scale;
> +
> +     BN_free(int_part);
> +
> +     return d + n->scale;
>  }
>  
>  static void
> 
>

Re: dc(1): floor(log_10(x)) should not cost more than O(log(log(x)))

Reply via email to