Re: [HACKERS] autogenerating headers bki stuff

2009-08-11 Thread Peter Eisentraut
On Tuesday 30 June 2009 06:59:51 Robert Haas wrote:
 The attached patch merges all of the logic currently in genbki.sh and
 Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl

I can't really convince myself to like this change.  I think there is some 
merit that these scripts are separate.  I'm not sure what the combined script 
buys us except that it is a lot bigger and does everything at once instead of 
in two or three steps.

That together with the big makefile moving around makes me think that this 
would cause more confusion and not much advantage.

Btw., is this stamp-h business really necessary?  I guess you copied that from 
pg_config.h, but the reason there is that everything depends on pg_config.h, 
and rerunning configure would normally cause everything to be rebuilt.  The 
files we are looking at here should only change when something is edited.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-08-11 Thread Alvaro Herrera
Peter Eisentraut escribió:
 On Tuesday 30 June 2009 06:59:51 Robert Haas wrote:
  The attached patch merges all of the logic currently in genbki.sh and
  Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl
 
 I can't really convince myself to like this change.  I think there is some 
 merit that these scripts are separate.  I'm not sure what the combined script 
 buys us except that it is a lot bigger and does everything at once instead of 
 in two or three steps.

Maybe we can move forward in little steps.  For example it would be
excellent to have the schemapg.h file autogenerated instead of having to
edit two copies of those definitions.

I'm not sure that it buys us a lot to have all those things in a single
script.  Why not have a script to generate schemapg.h, another one to
generate the bki stuff, another one to generate the fmgrtab header?
They don't seem to share a lot of code anyway (and even if they do,
surely we can add a .pm module containing common code).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-08-11 Thread Robert Haas
On Tue, Aug 11, 2009 at 4:52 PM, Peter Eisentrautpete...@gmx.net wrote:
 On Tuesday 30 June 2009 06:59:51 Robert Haas wrote:
 The attached patch merges all of the logic currently in genbki.sh and
 Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl

 I can't really convince myself to like this change.  I think there is some
 merit that these scripts are separate.  I'm not sure what the combined script
 buys us except that it is a lot bigger and does everything at once instead of
 in two or three steps.

 That together with the big makefile moving around makes me think that this
 would cause more confusion and not much advantage.

 Btw., is this stamp-h business really necessary?  I guess you copied that from
 pg_config.h, but the reason there is that everything depends on pg_config.h,
 and rerunning configure would normally cause everything to be rebuilt.  The
 files we are looking at here should only change when something is edited.

It definitely has less appeal without the anum.h stuff.

I think there is some benefit in having a single script because it
means that if we want to add additional syntax in the future (like
BKI_EXECUTE or DATA_DEFAULTS or some kind of more human-readable
notation for functions or opclasses, all of which were discussed
upthread) there is only one place to change.  But is that sufficient
reason to commit it at this point, given that we don't have a
fully-fleshed out design for any of those things?  Not sure.  Building
schemapg.h automatically seems definitely nice to me.

The stamp-h stuff is quite important for minimizing unnecessary
rebuilds.  Without that, any change to any include/catalog/*.h file
will trigger a massive rebuild.  With it, it only triggers a rebuild
if one of the outputs actually changes, and only for those portions
for which the output actually changed.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-27 Thread Peter Eisentraut
On Sunday 26 July 2009 01:40:09 Tom Lane wrote:
 And it is going to cost us in places like
 how do we generate the fmgr lookup table.

We rgrep the source tree for PG_FUNCTION_ARGS, extract the function name, and 
put them in a list.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-27 Thread Peter Eisentraut
On Sunday 26 July 2009 20:58:30 Tom Lane wrote:
 The argument about optional stuff doesn't impress me.  I would think
 that something that's going to be optionally loaded doesn't need to be
 considered during bootstrap mode at all.  We can just have initdb run
 some SQL scripts (or not) during its post-bootstrap phase.

Isn't that exactly what some people are proposing?  Remove the nonessential 
things from the DATA() lines and put them into SQL scripts?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-27 Thread Robert Haas
On Mon, Jul 27, 2009 at 4:17 AM, Peter Eisentrautpete...@gmx.net wrote:
 On Sunday 26 July 2009 01:40:09 Tom Lane wrote:
 And it is going to cost us in places like
 how do we generate the fmgr lookup table.

 We rgrep the source tree for PG_FUNCTION_ARGS, extract the function name, and
 put them in a list.

Probably parsing the SQL would be a better idea.  Otherwise, the
outputs would depend on every .c file in the entire source tree.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-27 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On Sunday 26 July 2009 20:58:30 Tom Lane wrote:
 The argument about optional stuff doesn't impress me.  I would think
 that something that's going to be optionally loaded doesn't need to be
 considered during bootstrap mode at all.  We can just have initdb run
 some SQL scripts (or not) during its post-bootstrap phase.

 Isn't that exactly what some people are proposing?  Remove the nonessential 
 things from the DATA() lines and put them into SQL scripts?

I think what some people are proposing is to rip inessential stuff
(say, the geometry types) out of core completely.  Which I'd not be
against.  The discussion at the moment is about whether to have
first-class and second-class citizens within the core set of functions;
and that I'm against.  I think two notations implemented by two
different toolchains will be confusing and not help maintenance that much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Jul 25, 2009 at 6:40 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 I'm not nearly as excited about migrating all or even most of, say,
 the pg_proc DATA lines into SQL.

 I think it would actually buy you quite a bit to migrate them to SQL,
 because in SQL, default properties can generally be omitted, which
 means that a patch which adds a new property to pg_proc that takes the
 same value for every row doesn't actually need to touch the SQL at
 all.

[ shrug... ]  If you think default values would buy something in
maintainability, we could revise the BKI notation to support them,
with a lot less work and risk than what you're proposing.  Perhaps
something like

DATA_DEFAULTS( pronamespace=PGNSP proowner=PGUID prolang=12 ... );

DATA( oid=1242 proname=boolin pronargs=2 ... );
DATA( oid=1243 proname=boolout pronargs=2 ... );

with the convention that any field not specified in either the
DATA macro or the current defaults would go to NULL, except OID
which would retain its current special treatment.  (Hmm, I wonder
if we'd even still need the _null_ hack anymore?)

I remain unexcited about inventing contraptions that solve limited
special cases.  It's just not that hard to maintain those cases
the way we're doing now, and every added processing step introduces
its own comprehension and maintenance overheads.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-26 Thread Andrew Dunstan



Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:
  

On Sat, Jul 25, 2009 at 6:40 PM, Tom Lanet...@sss.pgh.pa.us wrote:


I'm not nearly as excited about migrating all or even most of, say,
the pg_proc DATA lines into SQL.
  


  

I think it would actually buy you quite a bit to migrate them to SQL,
because in SQL, default properties can generally be omitted, which
means that a patch which adds a new property to pg_proc that takes the
same value for every row doesn't actually need to touch the SQL at
all.



[ shrug... ]  If you think default values would buy something in
maintainability, we could revise the BKI notation to support them,
with a lot less work and risk than what you're proposing.  Perhaps
something like

DATA_DEFAULTS( pronamespace=PGNSP proowner=PGUID prolang=12 ... );

DATA( oid=1242 proname=boolin pronargs=2 ... );
DATA( oid=1243 proname=boolout pronargs=2 ... );

with the convention that any field not specified in either the
DATA macro or the current defaults would go to NULL, except OID
which would retain its current special treatment.  (Hmm, I wonder
if we'd even still need the _null_ hack anymore?)
  


I kinda like this. It will make it easier not only to make catalog 
changes but to add entries to thinks like pg_proc (which is surely the 
biggest piece of the headache).



I remain unexcited about inventing contraptions that solve limited
special cases.  It's just not that hard to maintain those cases
the way we're doing now, and every added processing step introduces
its own comprehension and maintenance overheads.


  

Agreed.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-26 Thread Tom Lane
I wrote:
 ... So maybe we could split the current bootstrap phase
 into three phases:
   * create core catalogs and load DATA commands, using bki
   * create operator classes, using sql script
   * create indexes, using bki
   * proceed on as before

I experimented with that a little bit and found it doesn't seem to be
tremendously easy.  A non-bootstrap-mode backend will PANIC immediately
on startup if it doesn't find the critical system indexes, so the second
step has issues.  Also, there is no provision for resuming bootstrap
mode in an already-existing database, so the third step doesn't work
either.  We could hack up solutions to those things, but it's not clear
that it's worth it.  What seems more profitable is just to allow CREATE
OPERATOR CLASS/FAMILY to be executed while still in bootstrap mode.
There will still be some obstacles to be surmounted, no doubt (in
particular persuading these commands to run without system indexes
present) ... but we'd have to surmount those anyway.

In the spirit of not inventing single-purpose solutions, I suggest
that the BKI representation for this might be something like

BKI_EXECUTE( any old SQL command );

where the bootstrap.c code just passes the given string to the main SQL
parser, and whether it works or not is dependent on whether the
specified command has been bootstrap-mode-proofed.  For the moment we'd
only bother to fix CREATE OPERATOR CLASS/FAMILY to work that way, but
the door would be open for other things if it seemed worthwhile.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-26 Thread Greg Stark
On Sun, Jul 26, 2009 at 5:48 PM, Tom Lanet...@sss.pgh.pa.us wrote:

 In the spirit of not inventing single-purpose solutions, I suggest
 that the BKI representation for this might be something like

 BKI_EXECUTE( any old SQL command );

 where the bootstrap.c code just passes the given string to the main SQL
 parser, and whether it works or not is dependent on whether the
 specified command has been bootstrap-mode-proofed.  For the moment we'd
 only bother to fix CREATE OPERATOR CLASS/FAMILY to work that way, but
 the door would be open for other things if it seemed worthwhile.


I have nothing against a BKI_EXECUTE() like you propose.

But my instinct is still to go the other way. Of determining which
parts are actually necessary for bootstrapping and which parts really
aren't. I think it's valuable to have those two classes separated so
we understand when we're introducing new dependencies and when we're
varying from the well-trodden standard approaches.

It would also be valuable if we ever want to move some of these things
out to contrib modules or move other modules into the core. We might
even envision having optional components which the user could have the
optoin to decide at  at initdb-time whether to include them.

AFAICT the only opclasses that need to be in the bootstrap set are
int2_ops, int4_ops, name_ops, oid_ops, and oidvector_ops.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-26 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 AFAICT the only opclasses that need to be in the bootstrap set are
 int2_ops, int4_ops, name_ops, oid_ops, and oidvector_ops.

Maybe so, but the first two are part of the integer_ops family.  If
we have to continue implementing all of that through DATA statements
then we haven't done much towards making things more maintainable
or less fragile.  I think we need to try to get *all* of the operator
classes out of the hand-maintained-DATA-entries collection.

The argument about optional stuff doesn't impress me.  I would think
that something that's going to be optionally loaded doesn't need to be
considered during bootstrap mode at all.  We can just have initdb run
some SQL scripts (or not) during its post-bootstrap phase.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-26 Thread Robert Haas
On Sun, Jul 26, 2009 at 11:31 AM, Tom Lanet...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Jul 25, 2009 at 6:40 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 I'm not nearly as excited about migrating all or even most of, say,
 the pg_proc DATA lines into SQL.

 I think it would actually buy you quite a bit to migrate them to SQL,
 because in SQL, default properties can generally be omitted, which
 means that a patch which adds a new property to pg_proc that takes the
 same value for every row doesn't actually need to touch the SQL at
 all.

 [ shrug... ]  If you think default values would buy something in
 maintainability, we could revise the BKI notation to support them,
 with a lot less work and risk than what you're proposing.

Really?  I thought about that too, but concluded that it would be
easier to verify that a change to the BKI-generation stuff was correct
(by just diffing the generated files).  I don't know how to verify
that two versions of initdb do the same thing - I assume the databases
won't be byte-for-byte identical.  But that was my only concern about
it: I like the idea of expanding what can be done in BKI mode, if we
can figure out how to do it.

 Perhaps
 something like

 DATA_DEFAULTS( pronamespace=PGNSP proowner=PGUID prolang=12 ... );

 DATA( oid=1242 proname=boolin pronargs=2 ... );
 DATA( oid=1243 proname=boolout pronargs=2 ... );

 with the convention that any field not specified in either the
 DATA macro or the current defaults would go to NULL, except OID
 which would retain its current special treatment.  (Hmm, I wonder
 if we'd even still need the _null_ hack anymore?)

 I remain unexcited about inventing contraptions that solve limited
 special cases.  It's just not that hard to maintain those cases
 the way we're doing now, and every added processing step introduces
 its own comprehension and maintenance overheads.

If you think that the current system is anywhere close to ideal, I
give up.  To do so much as add a single line to pg_proc requires all
sort of useless manual work, like translating type names to OIDs, and
making sure that pronargs contains the correct value when the same
information is already encapsulated in both proargtypes and
proargmodes.

Introducing defaults for DATA() would bring some benefits because it
would mostly avoid the need to change every row in the file when
adding a new column.  But a preprocessing script can do much more
sophisticated transformations, like computing a value for a column, or
looking up type names in another file and translating them into OIDs.
It's not even hard; it's probably a 100-line patch on top of what I
already submitted.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-26 Thread Robert Haas
On Sun, Jul 26, 2009 at 1:58 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Greg Stark gsst...@mit.edu writes:
 AFAICT the only opclasses that need to be in the bootstrap set are
 int2_ops, int4_ops, name_ops, oid_ops, and oidvector_ops.

 Maybe so, but the first two are part of the integer_ops family.  If
 we have to continue implementing all of that through DATA statements
 then we haven't done much towards making things more maintainable
 or less fragile.  I think we need to try to get *all* of the operator
 classes out of the hand-maintained-DATA-entries collection.

Is this mostly a forward-reference problem?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Introducing defaults for DATA() would bring some benefits because it
 would mostly avoid the need to change every row in the file when
 adding a new column.  But a preprocessing script can do much more
 sophisticated transformations, like computing a value for a column, or
 looking up type names in another file and translating them into OIDs.

Hmm.  A preprocessing script that produces DATA commands might in fact
be a reasonable proposal, but it was not what I understood you to be
suggesting before.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Jul 26, 2009 at 1:58 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 I think we need to try to get *all* of the operator
 classes out of the hand-maintained-DATA-entries collection.

 Is this mostly a forward-reference problem?

No, I don't see that as particularly the issue.  What I'm concerned
about is the prospect of different parts of the same opfamily being
represented in different notations --- that sounds pretty error-prone
to me.  Greg is arguing that special-casing some minimum subset of the
opclasses is a good idea, but I disagree.  I think if we can make the
idea work at all, we can migrate *all* the built-in opclasses into the
higher-level notation, and that's how I want to approach it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-26 Thread Alvaro Herrera
Tom Lane escribió:

 I experimented with that a little bit and found it doesn't seem to be
 tremendously easy.  A non-bootstrap-mode backend will PANIC immediately
 on startup if it doesn't find the critical system indexes, so the second
 step has issues.  Also, there is no provision for resuming bootstrap
 mode in an already-existing database, so the third step doesn't work
 either.

FWIW we hacked up a sort-of-bootstrap mode in Mammoth Replicator to be
able to create our own catalogs and stuff.  It's not particularly
hard nor large:

 bootstrap.c |   31 ++!
 1 file changed, 6 insertions(+), 25 modifications(!)


(This is BSD code so feel free to use it if you find it useful)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
*** 83_rel/src/backend/bootstrap/bootstrap.c	2008-01-09 13:04:32.0 -0300
--- 23trunk/src/backend/bootstrap/bootstrap.c	2009-07-26 21:12:48.0 -0400
***
*** 27,36 
--- 27,39 
  #include catalog/index.h
  #include catalog/pg_type.h
  #include libpq/pqsignal.h
+ #include mammoth_r/mcp_queue.h
+ #include mammoth_r/txlog.h
  #include miscadmin.h
  #include nodes/makefuncs.h
  #include postmaster/bgwriter.h
  #include postmaster/walwriter.h
+ #include postmaster/replication.h
  #include storage/freespace.h
  #include storage/ipc.h
  #include storage/proc.h
***
*** 48,54 
  #define ALLOC(t, c)		((t *) calloc((unsigned)(c), sizeof(t)))
  
  static void CheckerModeMain(void);
! static void BootstrapModeMain(void);
  static void bootstrap_signals(void);
  static void ShutdownAuxiliaryProcess(int code, Datum arg);
  static hashnode *AddStr(char *str, int strlength, int mderef);
--- 51,57 
  #define ALLOC(t, c)		((t *) calloc((unsigned)(c), sizeof(t)))
  
  static void CheckerModeMain(void);
! static void BootstrapModeMain(char *dbname);
  static void bootstrap_signals(void);
  static void ShutdownAuxiliaryProcess(int code, Datum arg);
  static hashnode *AddStr(char *str, int strlength, int mderef);
***
*** 207,212 
--- 210,216 
  	int			flag;
  	AuxProcType auxType = CheckerProcess;
  	char	   *userDoption = NULL;
+ 	char   *dbname = NULL;
  
  	/*
  	 * initialize globals
***
*** 313,319 
  		}
  	}
  
! 	if (argc != optind)
  	{
  		write_stderr(%s: invalid command-line arguments\n, progname);
  		proc_exit(1);
--- 317,325 
  		}
  	}
  
! 	if (auxType == MammothBootstrapProcess  argc - optind + 1)
! 		dbname = argv[optind++]; 
! 	else if (argc != optind || auxType == MammothBootstrapProcess)
  	{
  		write_stderr(%s: invalid command-line arguments\n, progname);
  		proc_exit(1);
***
*** 337,342 
--- 343,350 
  			case WalWriterProcess:
  statmsg = wal writer process;
  break;
+ 			case MammothBootstrapProcess:
+ statmsg = mammoth bootstrap process;
  			default:
  statmsg = ??? process;
  break;
***
*** 410,416 
  			bootstrap_signals();
  			BootStrapXLOG();
  			StartupXLOG();
! 			BootstrapModeMain();
  			proc_exit(1);		/* should never return */
  
  		case StartupProcess:
--- 418,432 
  			bootstrap_signals();
  			BootStrapXLOG();
  			StartupXLOG();
! 			BootstrapModeMain(NULL);
! 			proc_exit(1);		/* should never return */
! 
! 		case MammothBootstrapProcess:
! 			bootstrap_signals();
! 			BootstrapTXLOG();
! 			BootStrapMCPQueue();
! 			StartupXLOG();
! 			BootstrapModeMain(dbname);
  			proc_exit(1);		/* should never return */
  
  		case StartupProcess:
***
*** 469,487 
   *	 commands in a special bootstrap language.
   */
  static void
! BootstrapModeMain(void)
  {
  	int			i;
  
  	Assert(!IsUnderPostmaster);
  
! 	SetProcessingMode(BootstrapProcessing);
  
  	/*
  	 * Do backend-like initialization for bootstrap mode
  	 */
  	InitProcess();
! 	InitPostgres(NULL, InvalidOid, NULL, NULL);
  
  	/* Initialize stuff for bootstrap-file processing */
  	for (i = 0; i  MAXATTR; i++)
--- 485,506 
   *	 commands in a special bootstrap language.
   */
  static void
! BootstrapModeMain(char *dbname)
  {
  	int			i;
  
  	Assert(!IsUnderPostmaster);
  
! 	if (dbname == NULL)
! 		SetProcessingMode(BootstrapProcessing);
! 	else
! 		SetProcessingMode(MammothBootstrapProcessing);
  
  	/*
  	 * Do backend-like initialization for bootstrap mode
  	 */
  	InitProcess();
! 	InitPostgres(dbname, InvalidOid, NULL, NULL);
  
  	/* Initialize stuff for bootstrap-file processing */
  	for (i = 0; i  MAXATTR; i++)
*** 83_rel/src/include/bootstrap/bootstrap.h	2008-01-09 13:04:49.0 -0300
--- 23trunk/src/include/bootstrap/bootstrap.h	2008-09-12 16:36:43.0 -0400
***
*** 70,76 
  	BootstrapProcess,
  	StartupProcess,
  	BgWriterProcess,
! 	WalWriterProcess
  } AuxProcType;
  
  #endif   /* BOOTSTRAP_H */
--- 70,77 
  	BootstrapProcess,
  	StartupProcess,
  	BgWriterProcess,
! 	

Re: [HACKERS] autogenerating headers bki stuff

2009-07-26 Thread Robert Haas
On Sun, Jul 26, 2009 at 8:46 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Introducing defaults for DATA() would bring some benefits because it
 would mostly avoid the need to change every row in the file when
 adding a new column.  But a preprocessing script can do much more
 sophisticated transformations, like computing a value for a column, or
 looking up type names in another file and translating them into OIDs.

 Hmm.  A preprocessing script that produces DATA commands might in fact
 be a reasonable proposal, but it was not what I understood you to be
 suggesting before.

OK, sorry if I was unclear.  I'm not sure exactly what you mean by
producing DATA() commands; I think the output should be BKI directly.
One of the things this patch does that I think is good (however flawed
it may be otherwise) is unifies all of the stuff that needs to parse
the DATA() statements into a single script.  I think this is something
we should pursue, because I think it will simplify the introduction of
any other notation we want to consider in this area (regardless of
whether it's DATA_DEFAULTS or EXEC_BKI or what have you).

Maybe I should rip out all the anum.h stuff (sniff, I'm sad, I liked
that design...) and resubmit.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-25 Thread Peter Eisentraut
On Tuesday 30 June 2009 06:59:51 Robert Haas wrote:
 The attached patch merges all of the logic currently in genbki.sh and
 Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl.  It
 then extends that logic to generate all of the Anum_* and Natts_*
 constants, as well as the Schema_pg_* declarations for the bootstrap
 tables.

I see a potential problem with the introduction of the catalog/anum.h header, 
to hold the Anum_... defines.  This looks like it could be a significant break 
in the backend API, as evidenced by the fact that plperl and dblink no longer 
compile with this change.

I think a less invasive change would be to include anum.h into all the 
catalog/pg_*.h headers, so that the external interface stays the same.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-25 Thread Robert Haas
On Sat, Jul 25, 2009 at 3:21 AM, Peter Eisentrautpete...@gmx.net wrote:
 On Tuesday 30 June 2009 06:59:51 Robert Haas wrote:
 The attached patch merges all of the logic currently in genbki.sh and
 Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl.  It
 then extends that logic to generate all of the Anum_* and Natts_*
 constants, as well as the Schema_pg_* declarations for the bootstrap
 tables.

 I see a potential problem with the introduction of the catalog/anum.h header,
 to hold the Anum_... defines.  This looks like it could be a significant break
 in the backend API, as evidenced by the fact that plperl and dblink no longer
 compile with this change.

 I think a less invasive change would be to include anum.h into all the
 catalog/pg_*.h headers, so that the external interface stays the same.

Gah.  I wish a toplevel make would build contrib.

Anyway, yeah, we could do that.  The downsides to that approach are:

1. Changing a catalog definition in a way that actually affects the
contents of anum.h will force more things to be recompiled (note that
there are guards against useless rebuilds of anum.h), and

2. If we do that, we'll probably be stuck with it forever, and it
seems like a bit of a hack.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-25 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Jul 25, 2009 at 3:21 AM, Peter Eisentrautpete...@gmx.net wrote:
 I think a less invasive change would be to include anum.h into all the
 catalog/pg_*.h headers, so that the external interface stays the same.

 Gah.  I wish a toplevel make would build contrib.

 Anyway, yeah, we could do that.  The downsides to that approach are:

I didn't realize this change was intending to throw all the Anum_
constants into a single header file.  I am strongly against that
on namespace pollution grounds, quite aside from the massive #include
restructuring it'd require.  And then there's the fact that any change
in such a file would force rebuild of just about the entire backend.

I do not see any virtue in autogenerating the Anum_ constants anyway.
Yeah, manually updating them is a bit of a pain, but it's only a tiny
part of the work that's normally involved in changing a system catalog.
In any case, practically all of the benefit involved could be gotten
by just not having to mess with the numerical values of the individual
constants.  Which we could do by setting them up as enums instead of
macros, along the lines of
http://archives.postgresql.org/pgsql-committers/2008-05/msg00080.php

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-25 Thread Robert Haas
On Sat, Jul 25, 2009 at 10:56 AM, Tom Lanet...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Jul 25, 2009 at 3:21 AM, Peter Eisentrautpete...@gmx.net wrote:
 I think a less invasive change would be to include anum.h into all the
 catalog/pg_*.h headers, so that the external interface stays the same.

 Gah.  I wish a toplevel make would build contrib.

 Anyway, yeah, we could do that.  The downsides to that approach are:

 I didn't realize this change was intending to throw all the Anum_
 constants into a single header file.  I am strongly against that
 on namespace pollution grounds,

I don't really understand this objection.  The reason why namespace
pollution is bad is because there's a risk that someone might be using
one of the names used for some other purpose, but the chances that
someone who is using a Anum_pg_* or Natts_pg_* constant also needs a
similarly named constant for some purpose other than referencing the
PostgreSQL catalogs seems so as nearly zero as makes no difference.

The hypothetical scenario in which this is a problem goes something
like this: someone is counting on the fact that if they include
catalog/pg_foo.h, then Anum_pg_foo_* and Natts_pg_foo will be
defined appropriately for reference to PostgreSQL backend catalogs,
but they are also counting on the fact that Anum_pg_bar_* and
Natts_pg_bar (for some value of bar that collides with a system
catalog name) are not defined and that they can use those constants
for their own internal purposes.  When they port their code to PG 8.5,
they are forced into changing the naming of those constants, because
it's no longer possible to just get the pg_foo constants without the
pg_bar constants.  If anyone is really doing this, I submit that it's
a horribly bad idea and they ought to stop right away whether this
patch gets committed or not.

 quite aside from the massive #include
 restructuring it'd require.

This is all done in the patch (with the exception of a handful of
loose ends that Peter found in his review) and I don't think it's all
that massive.

 And then there's the fact that any change
 in such a file would force rebuild of just about the entire backend.

It requires a rebuild of 56 of 547 files '*.c' files in src/backend,
which is to say 10.2% of the backend.  Also, the system is set up in
such a way that the timestamp on catalog/anum.h changes only when its
contents actually change, and dependencies are not rebuilt otherwise.
So basically it'll happen when someone adds an attribute to, or
removes one from, a system catalog: the fact that the .h file was
updated in some other way is not sufficient.

 I do not see any virtue in autogenerating the Anum_ constants anyway.
 Yeah, manually updating them is a bit of a pain, but it's only a tiny
 part of the work that's normally involved in changing a system catalog.

Well, I'd like to work on fixing some of the other problems too, but
this seems like a good place to start.  Currently, if there are two
uncommitted patches that make changes to the system catalog, whichever
is committed first is 100% guaranteed to conflict with each other, and
the resolution is typically painful.  Of course, fixing the Anum and
Natts declarations does not come close to fixing this problem: for
catalogs that are initialized with any data at bootstrap time, the
DATA() lines are a much bigger issue, but fixing that is going to
require a bigger hammer than can be put in place with one patch.  I do
think this is a pretty good foundation on which to build, though.

 In any case, practically all of the benefit involved could be gotten
 by just not having to mess with the numerical values of the individual
 constants.  Which we could do by setting them up as enums instead of
 macros, along the lines of
 http://archives.postgresql.org/pgsql-committers/2008-05/msg00080.php

I'd certainly be willing to concede that some of the benefit could be
gotten that way, but I'm not sure I agree with practically all.  The
benefits of this patch as I see them are: (1) to reduce the number of
places where a catalog change creates a merge conflict, and (2) to
eliminate the possibility of human error in setting up the Anum and
Natts declarations.  The fact that I found a case where this had been
done inconsistently in pg_listener (and no one noticed for 10 years)
provides that this is not an entirely hypothetical possibility even
for committed code, and I've definitely screwed it up a few times in
my own tree, too.  Replacing the declarations with enums would make
the merge conflicts involve fewer lines and maybe slightly simplify
the manual updating process, but it won't completely solve either
problem.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-25 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Jul 25, 2009 at 10:56 AM, Tom Lanet...@sss.pgh.pa.us wrote:
 I didn't realize this change was intending to throw all the Anum_
 constants into a single header file.  I am strongly against that
 on namespace pollution grounds,

 I don't really understand this objection.

It's for the same reasons we don't put all of include/catalog/ into one
giant header file, or all of include/ for that matter.  It's bad for
modularity, it's bad for compilation time, it's bad for rebuild time
if you're using --enable-depend.

 The reason why namespace
 pollution is bad is because there's a risk that someone might be using
 one of the names used for some other purpose,

Uh, no, that's actually pretty much irrelevant for our purposes.  As a
general rule, any two PG header files should be non-conflicting since
some .c file might need to include both.  So we'd have to get rid of
conflicts anyhow.  That does not make compartmentalization useless.
As a for-instance, exposing names that a given .c file doesn't really
need opens the door to typos that the compiler won't catch for you
(ie, accidentally using the wrong Anum_ constant, in this context).

 [ other straw-man argumentation snipped ]

None of this impresses me at all.  We should not throw a pile of
unrelated declarations into one header just to simplify the life
of an automatic script.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-25 Thread Greg Stark
On Sat, Jul 25, 2009 at 9:17 PM, Robert Haasrobertmh...@gmail.com wrote:
 Of course, fixing the Anum and
 Natts declarations does not come close to fixing this problem: for
 catalogs that are initialized with any data at bootstrap time, the
 DATA() lines are a much bigger issue, but fixing that is going to
 require a bigger hammer than can be put in place with one patch.  I do
 think this is a pretty good foundation on which to build, though.


I think addressing that would actually be fairly simple in theory.
Move a lot of those DATA lines to SQL initdb scripts. Virtually all of
pg_proc, pg_operator, pg_opclass, pg_opfamily, pg_cast, etc can be
initialized using SQL. Hardly any of the records in there are needed
for bootstrapping.

That would reduce the pain of editing this files *enormously*. The
worst part of adding new operators is making sure all the opclass
entries line up properly. And when there's an OID conflict and they
all have to be renumbered and the opclasses fixed up that's when
they're a real headache.



-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-25 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 Move a lot of those DATA lines to SQL initdb scripts. Virtually all of
 pg_proc, pg_operator, pg_opclass, pg_opfamily, pg_cast, etc can be
 initialized using SQL. Hardly any of the records in there are needed
 for bootstrapping.

It's easy to make that claim, much less easy to actually do it.

The other issue is that there will be some fraction of the entries that
unavoidably *are* needed before you can use SQL to insert the rest.
What will we do with those?  Having two different representations for
essentially the same kind of data isn't much fun.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-25 Thread Tom Lane
[ dept. of second thoughts ]

I wrote:
 It's easy to make that claim, much less easy to actually do it.

Reflecting on this a bit more ... it seems to me that it's not the right
thing to set the goal as try to get rid of as many DATA statements as
possible.  The right way to think about this is to classify the stuff
we have got in DATA statements, and consider how to reduce the pain
associated with the harder-to-maintain categories.

In particular, I think Greg correctly identified the main pain point
as being the catalog entries associated with operator classes (ie,
pg_opclass, pg_opfamily, pg_amop, pg_amproc entries).  So what I'm
thinking is we should consider how to migrate *all* of those entries
into CREATE OPERATOR CLASS/FAMILY commands.  And I think that that might
be doable.  Those entries are not needed by the system until we create
catalog indexes.  So maybe we could split the current bootstrap phase
into three phases:
* create core catalogs and load DATA commands, using bki
* create operator classes, using sql script
* create indexes, using bki
* proceed on as before

I'm not nearly as excited about migrating all or even most of, say,
the pg_proc DATA lines into SQL.  That simply isn't going to buy very
much in maintainability --- a patch that wants to add a new property
to all the functions is going to conflict just as much with another
patch doing the same.  And it is going to cost us in places like
how do we generate the fmgr lookup table.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autogenerating headers bki stuff

2009-07-25 Thread Robert Haas
On Sat, Jul 25, 2009 at 6:40 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 I'm not nearly as excited about migrating all or even most of, say,
 the pg_proc DATA lines into SQL.  That simply isn't going to buy very
 much in maintainability --- a patch that wants to add a new property
 to all the functions is going to conflict just as much with another
 patch doing the same.  And it is going to cost us in places like
 how do we generate the fmgr lookup table.

 Thoughts?

I think it would actually buy you quite a bit to migrate them to SQL,
because in SQL, default properties can generally be omitted, which
means that a patch which adds a new property to pg_proc that takes the
same value for every row doesn't actually need to touch the SQL at
all.  I suspect that's a pretty common case, too: SE-PostgreSQL
modifies a whole bunch of system catalogs to add a security label
attribute, and ALTER TABLE ... ALTER COLUMN ... SET DISTINCT adds a
column to pg_attribute that defaults to 0.

I can hear you objecting that there's no possible way we can use SQL
to construct pg_attribute, and that's certainly true.  But I have
another idea.  What we could do is generate the BKI but using some
more sophisticated method than just writing it all out longhand in the
header files and copying it over into the bki file.  The pg_attribute
entries for the bootstrap tables, for example, are mostly inferrable
from the PG_CATALOG() declarations (I think storage class and maybe
one other property might be problematic).  And certainly you could
design a more human readable format for the pg_proc entries, maybe
something like:

DATA_PG_PROC(function-name, function-arg1-type-name
function-arg2-type-name,
function-return-type-name,language,definition)

To convert this into BKI, you make an initial pass through pg_type.h
and collect the OIDs of all the type names.  Then you zip through
pg_proc.h and now you have enough information to map all the type
names into OIDs and generate the BKI.  I'm waving my hands a little
bit here but I really don't think this is too hard, coding-wise, and
it seems like it would make it a LOT easier to edit this file...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers