On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> 1. I'm still unhappy about the loop that fills the count histogram,
> as I noted earlier today.  It at least needs a decent comment and some
> overflow protection, and I'm not entirely convinced that it doesn't have
> more bugs than the overflow issue.
>

Attached patch is focused on fixing this. The "frac" variable overflow is
evaded by making it int64. I hope comments is clarifying something. In
general this loop copies behaviour of histogram constructing loop of
compute_scalar_stats function. But instead of values array we've array of
unique DEC and it's frequency.

------
With best regards,
Alexander Korotkov.
*** a/src/backend/utils/adt/array_typanalyze.c
--- b/src/backend/utils/adt/array_typanalyze.c
***************
*** 581,587 **** compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
  			DECountItem **sorted_count_items;
  			int			count_item_index;
  			int			delta;
! 			int			frac;
  			float4	   *hist;
  
  			/* num_hist must be at least 2 for the loop below to work */
--- 581,587 ----
  			DECountItem **sorted_count_items;
  			int			count_item_index;
  			int			delta;
! 			int64		frac;
  			float4	   *hist;
  
  			/* num_hist must be at least 2 for the loop below to work */
***************
*** 612,633 **** compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
  			hist[num_hist] = (double) element_no / (double) nonnull_cnt;
  
  			/*
! 			 * Construct the histogram.
! 			 *
! 			 * XXX this needs work: frac could overflow, and it's not clear
! 			 * how or why the code works.  Even if it does work, it needs
! 			 * documented.
  			 */
  			delta = analyzed_rows - 1;
  			count_item_index = 0;
! 			frac = sorted_count_items[0]->frequency * (num_hist - 1);
  			for (i = 0; i < num_hist; i++)
  			{
  				while (frac <= 0)
  				{
  					count_item_index++;
  					Assert(count_item_index < count_items_count);
! 					frac += sorted_count_items[count_item_index]->frequency * (num_hist - 1);
  				}
  				hist[i] = sorted_count_items[count_item_index]->count;
  				frac -= delta;
--- 612,642 ----
  			hist[num_hist] = (double) element_no / (double) nonnull_cnt;
  
  			/*
! 			 * Construct the histogram of DECs. The object of this loop is to
! 			 * copy the max and min DECs and evenly-spaced DECs in between
! 			 * ("space" here is number of arrays corresponding to DEC). If we
! 			 * imagine ordered array of DECs where each input array have a
! 			 * corresponding DEC item, i'th value of histogram will be 
! 			 * DECs[i * (analyzed_rows - 1) / (num_hist - 1)]. But instead
! 			 * of such array we've sorted_count_items which holds unique DEC
! 			 * values with their frequencies. We can imagine "frac" variable as
! 			 * an (index in DECs corresponding to next sorted_count_items
! 			 * element - index in DECs corresponding to last histogram value) *
! 			 * (num_hist - 1). In this case negative fraction leads us to
! 			 * iterate over sorted_count_items. 
  			 */
  			delta = analyzed_rows - 1;
  			count_item_index = 0;
! 			frac = (int64)sorted_count_items[0]->frequency * 
! 				   (int64)(num_hist - 1);
  			for (i = 0; i < num_hist; i++)
  			{
  				while (frac <= 0)
  				{
  					count_item_index++;
  					Assert(count_item_index < count_items_count);
! 					frac += (int64)sorted_count_items[count_item_index]->frequency * 
! 						    (int64)(num_hist - 1);
  				}
  				hist[i] = sorted_count_items[count_item_index]->count;
  				frac -= delta;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to