I'm happy working these ideas forward if there is interest. Basic design proposal is: - keep a minimal amount of bootstrap to avoid intrusive changes to core components - Add capabilities of creating objects with specific OIDs via DDL during initdb - Update the caching/resolution mechanism for builtin functions to be more dynamic. - Move as much of bootstrap as possible into SQL files and create catalog via DDL
Feedback appreciated. I can provide a sample patch if there is interest, about ~500 lines of combined diff for the needed infrastructure to support the above, not including the modifications to pg_proc.h that would follow. Thanks, Caleb On Thu, Dec 10, 2015 at 11:47 AM, Caleb Welton wrote: > > > Hello Hackers, > > Reviving an old thread on simplifying the bootstrap process. > > I'm a developer from the GPDB / HAWQ side of the world where we did some > work a while back to enable catalog definition via SQL files and we have > found it valuable from a dev perspective. The mechanism currently in those > products is a bit.. convoluted where SQL is processed in perl to create the > existing DATA statements, which are then processed as they are today in > Postgres... I wouldn't suggest this route, but having worked with both the > DATA mechanism and the SQL based one I've certainly found SQL to be a more > convenient way of interacting with the catalog. > > I'd propose: > - Keep enough of the existing bootstrap mechanism functional to get a > small tidy core, essentially you need enough of pg_type, pg_proc, pg_class, > pg_attribute to support the 25 types used by catalog tables and most > everything else can be moved into SQL processing like how system_views.sql > is handled today. > > The above was largely proposed back in March and rejected based on > concerns that > > 1. initdb would be slower. > 2. It would introduce too much special purpose bootstrap cruft into the > code. > 3. Editing SQL commands is not comfortable in bulk > > On 1. > > I have a prototype that handles about 1000 functions (all the functions in > pg_proc.h that are not used by other catalog tables, e.g. pg_type, > pg_language, pg_range, pg_aggregate, window functions, pg_ts_parser, etc). > > All of initdb can be processed in 1.53s. This compares to 1.37s with the > current bootstrap approach. So yes, this is slower, but not 'noticeably > slower' - I certainly didn't notice the 0.16s until I saw the concern and > then timed it. > > On 2. > > So far the amount of cruft has been: > - Enabling adding functions with specific OIDs when creating functions. > 1 line changes in pg_aggregate.c, proclang.c, typecmds.c > about dozen lines of code in functioncmds.c > 3 lines changed in pg_proc.c > - Update the fmgr_internal_validator for builtin functions while the > catalog is mutable > 3 lines changed in pg_proc.c > - Update how the builtin function cache is built > Some significant work in fmgr.c that honestly still needs cleanup > before it would be ready to propose as a patch that would be worthy of > committing. > - Update how builtin functions are resolved outside of bootstrap > Minor updates to dynloader for lookup of symbols within the current > executable, so far I've only done darwin.c for my prototype, this would > need to be extended to the other ports. > - Initializitation of the builtin cache > 2 line change in postinit.c > - Addition of a stage in initdb to process the sql directives similar in > scope to the processing of system_views.sql. > > No changes needed in the parser, planner, etc. My assessment is that this > worry is not a major concern in practice with the right implementation. > > On 3. > > Having worked with both SQL and bki DATA directives I have personally found > the convenience of SQL outweighs the pain. In many cases changes, such as > adding a new column to pg_proc, have minimal impact on the SQL > representation and what changes are needed are often simple to implement. > E.g. accounting for COST only needs to be done for the functions that need > something other than the default value. This however is somewhat > subjective. > > On the Pros side: > > a. Debugging bootstrap is extremely painful, debugging once initdb has > gotten to 'postgres --single' is way easier. > > b. It is easier to introduce minor issues with DATA directives than it is > when using the SQL processing used for all other user objects. > > Example: currently in Postgres all builtin functions default to COST 1, > and all SQL functions default to cost 100. However the following SQL > functions included in bootstrap inexplicably are initialized with a COST of > 1: > age(timestamp with time zone) > age(timestamp without time zone) > bit_length(bytea) > bit_length(text) > bit_length(bit) > date_part(text, abstime) > date_part(text, reltime) > date_part(text, date) > ... and 26 other examples > > c. SQL files are significantly less of a PITA (subjective opinion, but I > can say this from a perspective of experience working with both DATA > directives and SQL driven catalog definition). > > If people have interest I can share my patch so far if that helps address > concerns, but if there is not interest then I'll probably leave my > prototype where it is rather than investing more effort in the proof of > concept. > > Thanks, > Caleb >