On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane <[email protected]> wrote:
> 1. I'm still unhappy about the loop that fills the count histogram,
> as I noted earlier today. It at least needs a decent comment and some
> overflow protection, and I'm not entirely convinced that it doesn't have
> more bugs than the overflow issue.
>
Attached patch is focused on fixing this. The "frac" variable overflow is
evaded by making it int64. I hope comments is clarifying something. In
general this loop copies behaviour of histogram constructing loop of
compute_scalar_stats function. But instead of values array we've array of
unique DEC and it's frequency.
------
With best regards,
Alexander Korotkov.
*** a/src/backend/utils/adt/array_typanalyze.c
--- b/src/backend/utils/adt/array_typanalyze.c
***************
*** 581,587 **** compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
DECountItem **sorted_count_items;
int count_item_index;
int delta;
! int frac;
float4 *hist;
/* num_hist must be at least 2 for the loop below to work */
--- 581,587 ----
DECountItem **sorted_count_items;
int count_item_index;
int delta;
! int64 frac;
float4 *hist;
/* num_hist must be at least 2 for the loop below to work */
***************
*** 612,633 **** compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
hist[num_hist] = (double) element_no / (double) nonnull_cnt;
/*
! * Construct the histogram.
! *
! * XXX this needs work: frac could overflow, and it's not clear
! * how or why the code works. Even if it does work, it needs
! * documented.
*/
delta = analyzed_rows - 1;
count_item_index = 0;
! frac = sorted_count_items[0]->frequency * (num_hist - 1);
for (i = 0; i < num_hist; i++)
{
while (frac <= 0)
{
count_item_index++;
Assert(count_item_index < count_items_count);
! frac += sorted_count_items[count_item_index]->frequency * (num_hist - 1);
}
hist[i] = sorted_count_items[count_item_index]->count;
frac -= delta;
--- 612,642 ----
hist[num_hist] = (double) element_no / (double) nonnull_cnt;
/*
! * Construct the histogram of DECs. The object of this loop is to
! * copy the max and min DECs and evenly-spaced DECs in between
! * ("space" here is number of arrays corresponding to DEC). If we
! * imagine ordered array of DECs where each input array have a
! * corresponding DEC item, i'th value of histogram will be
! * DECs[i * (analyzed_rows - 1) / (num_hist - 1)]. But instead
! * of such array we've sorted_count_items which holds unique DEC
! * values with their frequencies. We can imagine "frac" variable as
! * an (index in DECs corresponding to next sorted_count_items
! * element - index in DECs corresponding to last histogram value) *
! * (num_hist - 1). In this case negative fraction leads us to
! * iterate over sorted_count_items.
*/
delta = analyzed_rows - 1;
count_item_index = 0;
! frac = (int64)sorted_count_items[0]->frequency *
! (int64)(num_hist - 1);
for (i = 0; i < num_hist; i++)
{
while (frac <= 0)
{
count_item_index++;
Assert(count_item_index < count_items_count);
! frac += (int64)sorted_count_items[count_item_index]->frequency *
! (int64)(num_hist - 1);
}
hist[i] = sorted_count_items[count_item_index]->count;
frac -= delta;
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers