Re: dc(1): floor(log_10(x)) should not cost more than O(log(log(x)))

kshe Fri, 01 Dec 2017 13:22:17 -0800

On Fri, 01 Dec 2017 08:58:55 +0000, Otto Moerbeek wrote:
> On Thu, Nov 30, 2017 at 01:10:33PM +0000, kshe wrote:
> > On Thu, 30 Nov 2017 07:22:42 +0000, Otto Moerbeek wrote:
> > > On Sun, Nov 26, 2017 at 07:40:03PM +0000, kshe wrote:
> > > > Hi,
> > > >
> > > > The `Z' command can be a handy shortcut for computing logarithms; as
> > > > such, for example, it is the basis of the implementation of bc(1)'s `l'
> > > > function.  However, the algorithm currently used in count_digits() is
> > > > too naive to be really useful, becoming rapidly much slower than what
> > > > would be expected from a native command.
> > > >
> > > > To see how this computation could easily be made exponentially faster,
> > > > one may start by noticing that, if next() is the function defined for
> > > > any real r as
> > > >
> > > >     next(r) := floor(r) + 1,
> > > >
> > > > then clearly, for any strictly positive integer x,
> > > >
> > > >     floor(log_2(x)) <= log_2(x) < next(log_2(x))
> > > >
> > > > and therefore
> > > >
> > > >     log_10(2) * floor(log_2(x)) <= log_10(x) < k,
> > > >
> > > > where
> > > >
> > > >     k := log_10(2) * next(log_2(x)).
> > > >
> > > > Since log_10(2) < 1, it follows that
> > > >
> > > >     floor(k) <= next(k - log_10(2)) <= next(log_10(x)) <= next(k),
> > > >
> > > > which proves that next(log_10(x)) is either floor(k) or next(k).
> > > >
> > > > If next(log_10(x)) = floor(k), then
> > > >
> > > >     10^floor(k) = 10^next(log_10(x)) > 10^log_10(x) = x.
> > > >
> > > > If next(log_10(x)) = next(k), then
> > > >
> > > >     10^floor(k) = 10^floor(log_10(x)) <= 10^log_10(x) = x.
> > > >
> > > > Therefore, if x >= 10^floor(k), then next(log_10(x)) cannot be floor(k),
> > > > hence it must be next(k); likewise, if x < 10^floor(k), then
> > > > next(log_10(x)) cannot be next(k), hence it must be floor(k).  Using the
> > > > conventional integer value of a boolean expression, this result can be
> > > > summarised as
> > > >
> > > >     next(log_10(x)) = floor(k) + (x >= 10^floor(k)).
> > > >
> > > > As an additional refinement, one may further notice that if
> > > >
> > > >     floor(k) = floor(log_10(2) * floor(log_2(x)))
> > > >
> > > > then
> > > >
> > > >     10^floor(k) = 10^floor(log_10(2) * floor(log_2(x)))
> > > >                 <= 10^(log_10(2) * floor(log_2(x)))
> > > >                 <= 2^floor(log_2(x))
> > > >                 <= x
> > > >
> > > > so that it can readily be concluded that
> > > >
> > > >     next(log_10(x)) = next(k)
> > > >
> > > > without having to compute 10^floor(k).
> > > >
> > > > The BN library permits computation of k in O(1) and 10^floor(k) in
> > > > O(log(k)) which is O(log(log(x))).  Therefore, one can compute
> > > > next(log_10(x)) in O(1) most of the time (at least on average, and with
> > > > a certain definition of such average, the full analysis of which is, I
> > > > presume, outside the scope of this message), with a worst case of
> > > > O(log(log(x))).  In contrast, the current code is exponentially worse
> > > > than what its worst case should be, computing this value in O(log(x)).
> > > >
> > > >     $ jot -b 9 -s '' 65536 >script
> > > >     $ echo Z >>script
> > > >
> > > >     $ time dc script
> > > >         0m03.57s real     0m03.56s user     0m00.01s system
> > > >     $ time ./dc script
> > > >         0m00.12s real     0m00.12s user     0m00.00s system
> > > >
> > > > The diff below implements this optimisation.  It also fixes a small
> > > > logic error in split_number(), which is used by count_digits().
> > >
> > > General comment: it would be a good thing to add (part of) the proof
> > > or a reference to some published work to the code in a comment.
> > > Especially the derivation of c and the computation of d seem a bit
> > > like dropping out of thin air.
> >
> > All the above proof says is that a logarithm in one base is easily
> > convertible to the corresponding one in another base.  Since this has
> > been known for a few centuries already, I see no reason to provide more
> > details here than for the algorithms (and associated constants) featured
> > in /usr/share/misc/bc.library, where no explanations of any kind are to
> > be found either.
> >
> > > From a style point of view I do not like the assignment in
> > > conditionals very much.
> >
> > Please feel free to apply your prefered style to this patch.
> >
> > > There's also the problem of bits * c overflowing, though that's likely
> > > theoretical.
> >
> > To provoke such overflow, one would first need a platform where ints are
> > larger than 32 bits, and the involved numbers would have to exceed
> > 2^2147483648.  This is indeed unlikely to happen.
> >
> > Regards,
> >
> > kshe
>
> So this is what I plan to commit. Changes wrt your version: make d
> uint, and do not use assignment in conditional test.


I still think that the added comments are actually quite useless and
could even be considered misleading, but I shall not oppose that you
commit this version if you feel that these changes are important.

For the sake of completeness, however, I attached below the code that I
will be keeping in my personal tree, which incorporates a generalisation
of the optimisation featured in my original patch.  My point is that
thinking about this problem in such terms as "estimation" or "rounding
error" would never lead to such an improvement, which is quite more
subtle, even if the proof itself (which, this time, I prefer to leave as
an exercise to the interested reader) can essentially hold in one short
sentence if the right approach is followed.

By the way, there are infinitely many such generalisations, each of
which includes a single additional call to BN_is_bit_set() and divides
the number of integers that require O(log(log(x))) complexity by two,
which is a nice way to see how their maximum arithmetic density can be
made arbitrarily close to zero by successive applications of the same
idea on top of itself.

static u_int
count_digits(const struct number *n)
{
        BN_CTX          *ctx;
        BIGNUM          *int_part, *a, *p;
        uint64_t        d, c = 0x4D104D42;
        int             bits, s;

        if (BN_is_zero(n->number))
                return n->scale ? n->scale : 1;

        bn_checkp(int_part = BN_new());
        split_number(n, int_part, NULL);

        if ((bits = BN_num_bits(int_part)) == 0)
                d = 0;
        else if ((d = (c * bits) >> 32) != (c * (bits - 1)) >> 32) {
                if (((c * (bits - 3)) >> 32 != (c * (bits - 4)) >> 32) ^
                    (s = BN_is_bit_set(int_part, bits - 2)))
                        d += s;
                else {
                        bn_checkp(ctx = BN_CTX_new());
                        bn_checkp(a = BN_new());
                        bn_checkp(p = BN_new());

                        bn_check(BN_set_word(a, 10));
                        bn_check(BN_set_word(p, d));
                        bn_check(BN_exp(a, a, p, ctx));

                        if (BN_ucmp(int_part, a) >= 0)
                                d++;

                        BN_CTX_free(ctx);
                        BN_free(a);
                        BN_free(p);
                }
        } else
                d++;

        BN_free(int_part);

        return d + n->scale;
}

Regards,

kshe

Re: dc(1): floor(log_10(x)) should not cost more than O(log(log(x)))

Reply via email to