Re: [HACKERS] Collect frequency statistics for arrays

Tom Lane Sat, 03 Mar 2012 10:04:59 -0800

... BTW, could you explain exactly how that "Fill histogram by hashtab"
loop works?  It's way too magic for my taste, and does in fact have bugs
in the currently submitted patch.  I've reworked it to this:


            /* Fill histogram by hashtab. */
            delta = analyzed_rows - 1;
            count_item_index = 0;
            frac = sorted_count_items[0]->frequency * (num_hist - 1);
            for (i = 0; i < num_hist; i++)
            {
                while (frac <= 0)
                {
                    count_item_index++;
                    Assert(count_item_index < count_items_count);
                    frac += sorted_count_items[count_item_index]->frequency * 
(num_hist - 1);
                }
                hist[i] = sorted_count_items[count_item_index]->count;
                frac -= delta;
            }
            Assert(count_item_index == count_items_count - 1);

The asserts don't fire in any test case I've tried, which seems to
indicate that it *does* work in the sense that the first histogram entry
is always the smallest count and the last histogram entry is always
the largest one.  But it's extremely unclear why it manages to stop
exactly at the last count_items array entry, or for that matter why it's
generating a representative histogram at all.  I'm suspicious that the
"-1" bits represent off-by-one bugs.

I also don't especially like the fact that "frac" is capable of
overflowing (since worst case frequency is 300 * 10000 and worst case
num_hist is 10000, with the current limits on statistics_target).
We could work around that by doing the frac arithmetic in int64, but I
wonder whether that couldn't be avoided.  In any case, first I'd like
an explanation why this code works at all.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collect frequency statistics for arrays

Reply via email to