I am working on designing some new datatypes and could use some
guidance.

Along with each data item, I must keep additional information about
the scale of measurement.  Further, the relevant scales of measurement
fall into a few major families of related scales, so at least a
different type will be required for each of these major families.
Additionally, I wish to be able to convert data measured according to
one scale into other scales (both within the same family and between
different families), and these interconversions require relatively
large sets of parameters.

It seems that there are several alternative approaches, and I am
seeking some guidance from the wizards here who have some
understanding of the backend internals, performance tradeoffs, and
such issues.

Possible solutions:

1.  Store the data and all the scale parameters within the type.

    Advantages:  All information contained within each type.  Can be
    implemented with no backend changes.  No access to ancillary tables
    required, so processing might be fast.

    Disadvantages: Duplicate information on the scales recorded in
    each field of the types; i.e., waste of space.  I/O is either
    cumbersome (if all parameters are required) or they type-handling
    code has built-in tables for supplying missing parameters, in
    which case the available types and families cannot be extended by
    users without recompiling the code.

2.  Store only the data and a reference to a compiled-in data table
    holding the scale parameters.

    Advantages:  No duplicate information stored in the fields.
    Access to scale data compiled into backend, so processing might be
    fast.

    Disadvantages: Tables of scale data fixed at compile time, so
    users cannot add additional scales or families of scales.
    Requires backend changes to implement, but these changes are
    relatively minor since all the scale parameters are compiled into
    the code handling the type.

3.  Store only the data and a reference to a new system table (or
    tables) holding the scale parameters.

    Advantages:  No duplicate information stored in the fields.
    Access to scale data _not_ compiled into backend, so users could
    add scales or families of scales by modifying the system tables.

    Disadvantages: Requires access to system tables to perform
    conversions, so processing might be slow.  Requires more complex
    backend changes to implement, including the ability to retrieve
    information from system tables.

Clearly, option 3 is optimal (more flexible, no data duplication)
unless the access to system tables by the backend presents too much
overhead.  (Other suggestions are welcome, especially if I have
misjudged the relative merits of these ideas or missed one
altogether.)  The advice I need is the following:

- How much of an overhead is introduced by requiring the backend to
  query system tables during tuple processing?  Is this unacceptable
  from the outset or is it reasonable to consider this option further?
  Note that the size of these new tables will not be large (probably
  less than 100 tuples) if that matters.

- How does one access system tables from the backend code?  I seem to
  recall that issuing straight queries via SPI is not necessarily the
  right way to go about this, but I'm not sure where to look for
  alternatives.

Thanks for your help.

Cheers,
Brook

Reply via email to