Re: [HACKERS] Bootstrap DATA is a pita

Caleb Welton Fri, 11 Dec 2015 11:17:17 -0800

I'm happy working these ideas forward if there is interest.

Basic design proposal is:
  - keep a minimal amount of bootstrap to avoid intrusive changes to core
components
  - Add capabilities of creating objects with specific OIDs via DDL during
initdb
  - Update the caching/resolution mechanism for builtin functions to be
more dynamic.
  - Move as much of bootstrap as possible into SQL files and create catalog
via DDL


Feedback appreciated.

I can provide a sample patch if there is interest, about ~500 lines of
combined diff for the needed infrastructure to support the above, not
including the modifications to pg_proc.h that would follow.

Thanks,
  Caleb

On Thu, Dec 10, 2015 at 11:47 AM, Caleb Welton wrote:
>
>
> Hello Hackers,
>
>   Reviving an old thread on simplifying the bootstrap process.
>
>   I'm a developer from the GPDB / HAWQ side of the world where we did some
> work a while back to enable catalog definition via SQL files and we have
> found it valuable from a dev perspective.  The mechanism currently in those
> products is a bit.. convoluted where SQL is processed in perl to create the
> existing DATA statements, which are then processed as they are today in
> Postgres... I wouldn't suggest this route, but having worked with both the
> DATA mechanism and the SQL based one I've certainly found SQL to be a more
> convenient way of interacting with the catalog.
>
>   I'd propose:
>      - Keep enough of the existing bootstrap mechanism functional to get a
> small tidy core, essentially you need enough of pg_type, pg_proc, pg_class,
> pg_attribute to support the 25 types used by catalog tables and most
> everything else can be moved into SQL processing like how system_views.sql
> is handled today.
>
>   The above was largely proposed back in March and rejected based on
> concerns that
>
>   1. initdb would be slower.
>   2. It would introduce too much special purpose bootstrap cruft into the
> code.
>   3. Editing SQL commands is not comfortable in bulk
>
> On 1.
>
> I have a prototype that handles about 1000 functions (all the functions in
> pg_proc.h that are not used by other catalog tables, e.g. pg_type,
> pg_language, pg_range, pg_aggregate, window functions, pg_ts_parser, etc).
>
> All of initdb can be processed in 1.53s. This compares to 1.37s with the
> current bootstrap approach.  So yes, this is slower, but not 'noticeably
> slower' - I certainly didn't notice the 0.16s until I saw the concern and
> then timed it.
>
> On 2.
>
> So far the amount of cruft has been:
>   - Enabling adding functions with specific OIDs when creating functions.
>     1 line changes in pg_aggregate.c, proclang.c, typecmds.c
>     about dozen lines of code in functioncmds.c
>     3 lines changed in pg_proc.c
>   - Update the fmgr_internal_validator for builtin functions while the
> catalog is mutable
>     3 lines changed in pg_proc.c
>   - Update how the builtin function cache is built
>     Some significant work in fmgr.c that honestly still needs cleanup
> before it would be ready to propose as a patch that would be worthy of
> committing.
>   - Update how builtin functions are resolved outside of bootstrap
>     Minor updates to dynloader for lookup of symbols within the current
> executable, so far I've only done darwin.c for my prototype, this would
> need to be extended to the other ports.
>   - Initializitation of the builtin cache
>     2 line change in postinit.c
>   - Addition of a stage in initdb to process the sql directives similar in
> scope to the processing of system_views.sql.
>
> No changes needed in the parser, planner, etc.  My assessment is that this
> worry is not a major concern in practice with the right implementation.
>
> On 3.
>
> Having worked with both SQL and bki DATA directives I have personally found
> the convenience of SQL outweighs the pain.  In many cases changes, such as
> adding a new column to pg_proc, have minimal impact on the SQL
> representation and what changes are needed are often simple to implement.
> E.g. accounting for COST only needs to be done for the functions that need
> something other than the default value.  This however is somewhat
> subjective.
>
> On the Pros side:
>
>   a. Debugging bootstrap is extremely painful, debugging once initdb has
> gotten to 'postgres --single' is way easier.
>
>   b. It is easier to introduce minor issues with DATA directives than it is
> when using the SQL processing used for all other user objects.
>
>    Example: currently in Postgres all builtin functions default to COST 1,
> and all SQL functions default to cost 100. However the following SQL
> functions included in bootstrap inexplicably are initialized with a COST of
> 1:
>    age(timestamp with time zone)
>    age(timestamp without time zone)
>    bit_length(bytea)
>    bit_length(text)
>    bit_length(bit)
>    date_part(text, abstime)
>    date_part(text, reltime)
>    date_part(text, date)
>    ... and 26 other examples
>
>   c. SQL files are significantly less of a PITA (subjective opinion, but I
> can say this from a perspective of experience working with both DATA
> directives and SQL driven catalog definition).
>
> If people have interest I can share my patch so far if that helps address
> concerns, but if there is not interest then I'll probably leave my
> prototype where it is rather than investing more effort in the proof of
> concept.
>
> Thanks,
>   Caleb
>

Re: [HACKERS] Bootstrap DATA is a pita

Reply via email to