Hi Tom,

> -----Original Message-----
> From: Tom Lane [mailto:[EMAIL PROTECTED] 
> Sent: 29 January 2004 15:31
> To: Mark Cave-Ayland
> Subject: Re: [PATCHES] ANALYZE patch for review 

<lots cut about pointers>

OK, I've had another attempt at writing the code as you suggested but
the more I work on it the less I like it :(. What I would like to do is
make the VacAttrStats structure so that it just contains the information
that is updated in the pg_statistic table, however this fell apart when
I realised that update_attstats() suddenly requires the attr and
attrtype fields to be present. Doh.

So I'd like to propose a slightly different solution. I think that
examine_attribute() should return a pointer to a custom structure
containing any information that needs to be passed to the datatype
specific routine (not the entire VacAttrStats structure), or NULL if the
column should not be analyzed. I'm also considering changing the
examine_attribute() input parameters to be Relation, Attribute, Type for
the current column along with a pointer to a bool to indicate whether or
not the column should be analyzed or not.

If examine_attribute() sets the bool to false then the column is
ignored. If the bool is set to true then a VacAttrStats structure is
created in memory, and then the Attribute and Type tuple information is
copied into the VacAttrStats structure. A new field for VacAttrStats
will contain the pointer to the custom structure returned by
examine_attribute() which can then be passed into the compute_*_stats()
functions as an extra parameter.

This seems to achieve the aims of abstracting the statistics data from
the intermediate information required by the statistics routines,
allowing extra/custom data to be passed between the typanalyze function
and the statistics algorithm, and allowing the user to have the attr and
attrtype structures given to them. The only thing I don't really like
about this is providing a pointer to a bool in examine_attribute() -
however this is needed to distinguish from a NULL meaning 'I have no
custom data but the analyze function should still be called' and 'This
column should not be analyzed'. I can't think of a better solution at
the moment.

> > I'm beginning to think that perhaps we're looking at this 
> in the wrong 
> > way, and that a more elegant version of what you're 
> suggesting could 
> > be implemented using a major/minor method of identifying a 
> statistics 
> > type.
> If you suppose that the "major" field is the upper bits of 
> the statistics ID value, then this is just a slightly 
> different way of thinking about the range-based allocation 
> method I suggested before. However, the range-based method 
> can adapt to allocating different amounts of identifier space 
> to different owners, whereas a major/minor approach can't 
> easily do that since you've defined it to be 2^N minor IDs 
> for each major code.

I was thinking perhaps in terms of an extra staowner int2 field in
pg_statistic where the IDs are allocated by the PGDG. Then each
group/project would only require one owner id to be allocated to them
and then have the existing 2^16 stakind space to organise themselves.
The advantage of this is that projects can allocate their own stakind
fields, implementing new or improved statistic algorithms without having
to wait on the new allocation from the PGDG.

Many thanks,



Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446

This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to