Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote: > Thanks for the review. > > > The standard specifies that non-recursive WITH should be evaluated > > once. > > What shall we do? I don't think there's a easy way to fix this. Maybe > we should not allow WITH clause without RECURISVE? My interpretation of 7.13: General Rules: 2.b is that it should be single evaluation, even if RECURSIVE is present. The previous discussion was here: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php The important arguments in the thread seemed to be: 1. People will generally expect single evaluation, so might be disappointed if they can't use this feature for that purpose. 2. It's a spec violation in the case of volatile functions. 3. "I think this is a "must fix" because of the point about volatile functions --- changing it later will result in user-visible semantics changes, so we have to get it right the first time." I don't entirely agree with #3. It is user-visible, but only in the sense that someone is depending on undocumented multiple-evaluation behavior. Tom Lane said that multiple evaluation is grounds for rejection: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php Is there hope of correcting this before November? > I will try to fix this. However detecting the query being not a > non-linear one is not so easy. If we don't allow mutual recursion, the only kind of non-linear recursion that might exist would be multiple references to the same recursive query name in a recursive query, is that correct? > > * DISTINCT should supress duplicates: > > > > with recursive foo(i) as > > (select distinct * from (values(1),(2)) t > > union all > > select distinct i+1 from foo where i < 10) > > select * from foo; > > > > This outputs a lot of duplicates, but they should be supressed > > according to the standard. This query is essentially the same as > > supporting UNION for recursive queries, so we should either fix both for > > 8.4 or block both for consistency. > > I'm not sure if it's possible to fix this. Will look into. > Can't we just reject queries with top-level DISTINCT, similar to how UNION is rejected? > > * outer joins on a recursive reference should be blocked: > > > > with recursive foo(i) as > > (values(1) > > union all > > select i+1 from foo left join (values(1)) t on (i=column1)) > > select * from foo; > > > > Causes an infinite loop, but the standard says using an outer join > > in this situation should be prohibited. This should be fixed for 8.4. > > Not an issue, I think. Agreed, Andrew Gierth corrected me here. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
"Fujii Masao" <[EMAIL PROTECTED]> wrote: > 3) Patch of introducing new background process which I've called > WALSender. It takes charge of sending WAL to the slave. > > Now, I assume that WALSender also listens the connection from > the slave, i.e. only one sender process manages multiple slaves. > The relation between WALSender and backend is 1:1. So, > the communication mechanism between them can be simple. I assume that he says only one backend communicates with WAL sender at a time. The communication is done during WALWriteLock is held, so other backends wait for the communicating backend on WALWriteLock. WAL sender only needs to send one signal for each time it sends WAL buffers to slave. We could be split the LWLock to WALWriterLock and WALSenderLock, but the essential point is same. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbosity of Function Return Type Checks
On Mon, 8 Sep 2008, Alvaro Herrera <[EMAIL PROTECTED]> writes: >> Modified as you suggested. BTW, will there be a similar i18n scenario >> for "dropped column" you mentioned below? > > Yes, you need _() around those too. For this purpose, I introduced a dropped_column_type variable in validate_tupdesc_compat() function: const char dropped_column_type[] = _("n/a (dropped column)"); And used this in below fashion: OidIsValid(returned->attrs[i]->atttypid) ? format_type_be(returned->attrs[i]->atttypid) : dropped_column_type >> Done with format_type_be() usage. > > BTW you forgot to remove the quotes around the type names (I know I told > you to add them but Tom gave the reasons why it's not needed) :-) Done. > Those are minor problems that are easily fixed. However there's a > larger issue that Tom mentioned earlier and I concur, which is that the > caller is forming the primary error message and passing it down. It > gets a bit silly if you consider the ways the messages end up worded: > >errmsg("returned record type does not match expected record type")); >errdetail("Returned type \"%s\" does not match expected type \"%s\" in > column \"%s\".", >---> this is the case where it's OK > >errmsg("wrong record type supplied in RETURN NEXT")); >errdetail("Returned type \"%s\" does not match expected type \"%s\" in > column \"%s\".", >--> this is strange > >errmsg("returned tuple structure does not match table of trigger event")); >errdetail("Returned type \"%s\" does not match expected type \"%s\" in > column \"%s\".", >--> this is not OK What we're trying to do is to avoid code duplication while checking the returned tuple types against expected tuple types. For this purpose we implemented a generic function (validate_tupdesc_compat) to handle all possible cases. But, IMHO, it's important to remember that there is no perfect generic function to satisfy all possible cases at best. > I've been thinking how to pass down the context information without > feeding the complete string, but I don't find a way without doing > message construction. Construction is to be avoided because it's a > pain for translators. > > Maybe we should just use something generic like errmsg("mismatching > record type") and have the caller pass two strings specifying what's > the "returned" tuple and what's the "expected", but I don't see how > ... (BTW this is worth fixing, because every case seems to have > appeared independently without much thought as to other callsites with > the same pattern.) I considered the subject with identical constraints but couldn't come up with a more rational solution. Nevertheless, I'm open to any suggestion. Regards. Index: src/pl/plpgsql/src/pl_exec.c === RCS file: /projects/cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v retrieving revision 1.219 diff -c -r1.219 pl_exec.c *** src/pl/plpgsql/src/pl_exec.c 1 Sep 2008 22:30:33 - 1.219 --- src/pl/plpgsql/src/pl_exec.c 9 Sep 2008 05:48:57 - *** *** 188,194 Oid reqtype, int32 reqtypmod, bool isnull); static void exec_init_tuple_store(PLpgSQL_execstate *estate); ! static bool compatible_tupdesc(TupleDesc td1, TupleDesc td2); static void exec_set_found(PLpgSQL_execstate *estate, bool state); static void plpgsql_create_econtext(PLpgSQL_execstate *estate); static void free_var(PLpgSQL_var *var); --- 188,195 Oid reqtype, int32 reqtypmod, bool isnull); static void exec_init_tuple_store(PLpgSQL_execstate *estate); ! static void validate_tupdesc_compat(TupleDesc expected, TupleDesc returned, ! const char *msg); static void exec_set_found(PLpgSQL_execstate *estate, bool state); static void plpgsql_create_econtext(PLpgSQL_execstate *estate); static void free_var(PLpgSQL_var *var); *** *** 384,394 { case TYPEFUNC_COMPOSITE: /* got the expected result rowtype, now check it */ ! if (estate.rettupdesc == NULL || ! !compatible_tupdesc(estate.rettupdesc, tupdesc)) ! ereport(ERROR, ! (errcode(ERRCODE_DATATYPE_MISMATCH), ! errmsg("returned record type does not match expected record type"))); break; case TYPEFUNC_RECORD: --- 385,392 { case TYPEFUNC_COMPOSITE: /* got the expected result rowtype, now check it */ ! validate_tupdesc_compat(tupdesc, estate.rettupdesc, ! gettext_noop("returned record type does not match expected record type")); break; case TYPEFUNC_RECORD: *** *** 705,715 rettup = NULL; else { ! if (!compatible_tupdesc(estate.rettupdesc, ! trigdata->tg_relation->rd_att)) ! ereport(ERROR, ! (errcode(ERRCODE_DATATYPE_MISMATCH), ! errmsg("returned tuple structure does not match table of trigger event"))); /* Copy tuple to upper executor memory */ rettup = SPI_co
Re: [HACKERS] Synchronous Log Shipping Replication
Hi, I looked some comment for the synchronous replication and understood the consensus of the community was that the sync replication should be added using not hooks and plug-ins but core-patches. If my understanding is right, I will change my development plan so that the sync replication may be put into core. But, I don't think every features should be put into core. Of course, the high-availability features (like clustering, automatic failover, ...etc) are out of postgres. The user who wants whole HA solution using the sync replication must integrate postgres and clustering software like heartbeat. WAL sending should be put into core. But, I'd like to separate WAL receiving from core and provide it as a new contrib tool. Because, there are some users who use the sync replication as only WAL streaming. They don't want to start postgres on the slave. Of course, the slave can replay WAL by using pg_standby and WAL receiver tool which I'd like to provide as a new contrib tool. I think the patch against recovery code is not necessary. I arrange the development items below : 1) Patch around XLogWrite. It enables a backend to wake up the WAL sender process at the timing of COMMIT. 2) Patch for the communication between a backend and WAL sender process. There were some discussions about this topic. Now, I decided to adopt imessages proposed by Markus. 3) Patch of introducing new background process which I've called WALSender. It takes charge of sending WAL to the slave. Now, I assume that WALSender also listens the connection from the slave, i.e. only one sender process manages multiple slaves. The relation between WALSender and backend is 1:1. So, the communication mechanism between them can be simple. As other idea, I can introduce new listener process and fork new WALSender for every slave. Which architecture is better? Or, should postmaster listen also the connection from the slave? 4) New contrib tool which I've called WALReceiver. It takes charge of receiving WAL from the master and writing it to disk on the slave. I will submit these patches and tool by Nov Commit Fest at the latest. Any comment welcome! best regards -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [EMAIL PROTECTED]: Re: [HACKERS] [patch] GUC source file and line number]
On Mon, 8 Sep 2008, Alvaro Herrera wrote: (I dropped the "default" stuff for now, as it doesn't seem that a consensus has been reached on that topic.) I have multiple GUC-related projects that are all stalled waiting for that capability to be added. The only thing there wasn't clear consensus on was exactly what the name for it should be, and there I really don't care. I made the argument for why I named it the way I did, but if it gets committed with a less friendly name (like boot_val) I won't complain. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
>> * Single Evaluation: >> >> with >> foo(i) as (select random() as i) >> select * from foo union all select * from foo; >>i >> --- >>0.233165248762816 >> 0.62126633618027 >> (2 rows) >> >> The standard specifies that non-recursive WITH should be evaluated >> once. > > What shall we do? I don't think there's a easy way to fix this. Maybe > we should not allow WITH clause without RECURISVE? ISTM that kind of misses the point. Even if it's WITH RECURSIVE rather than simply WITH, one wouldn't expect multiple evaluations of any non-recursive portion of the query. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Thanks for the review. [I dropped [EMAIL PROTECTED] from the Cc list since he has left our company and the email address is being deleted.] I'm going to look into issues which are seem to be bug (of course if you know what to fix, patches are always welcome:-). > These are my initial comments about the Common Table Expressions (CTE) > patch, also known as WITH [RECURSIVE]. These comments are based on the > patch here: > > http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php > > This is a semantically complex feature, and the standard is fairly > complex as well. So I'm approaching this by making my own > interpretations from the standard first (I included my interpretations > and section references at the end of this email) and comparing to the > behavior of the patch. > > The following examples may be inconsistent with the standard. Some > have already been mentioned, and I don't think they all need to be > fixed for 8.4, but I mention them here for completeness. > > * Mutual Recursion: > > with recursive > foo(i) as (values(1) union all select i+1 from bar where i < 10), > bar(i) as (values(1) union all select i+1 from foo where i < 10) > select * from foo; > ERROR: mutual recursive call is not supported > > The standard allows mutual recursion. The discussion seems to agree that let it leave for post 8.4. > * Single Evaluation: > > with > foo(i) as (select random() as i) > select * from foo union all select * from foo; >i > --- >0.233165248762816 > 0.62126633618027 > (2 rows) > > The standard specifies that non-recursive WITH should be evaluated > once. What shall we do? I don't think there's an easy way to fix this as Tom suggested. Maybe we should not allow WITH clause without RECURISVE for 8.4? > * RHS only: > > with recursive > foo(i) as (select i+1 from foo where i < 10 union all values(1)) > select * from foo; > ERROR: Left hand side of UNION ALL must be a non-recursive term in a > recursive query > > The standard does not require that the recursive term be on the RHS. The discussion seems to agree that let it leave for post 8.4. > * UNION ALL only: > > with recursive > foo(i) as (values(1) union select i+1 from foo where i < 10) > select * from foo; > ERROR: non-recursive term and recursive term must be combined with > UNION ALL > > The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT > (when the recursive term is not on the RHS of the EXCEPT). The discussion seems to agree that let it leave for post 8.4. > * Binary recursion and subselect strangeness: > > with recursive foo(i) as > (values (1) > union all > select * from > (select i+1 from foo where i < 10 > union all > select i+1 from foo where i < X) t) > select * from foo; > > Produces 10 rows of output regardless of what "X" is. This should be > fixed for 8.4. > Also, this is non-linear recursion, which the standard seems to > disallow. I will try to fix this. However detecting the query being not a non-linear one is not so easy. > * Multiple recursive references: > > with recursive foo(i) as > (values (1) > union all > select i+1 from foo where i < 10 > union all > select i+1 from foo where i < 20) > select * from foo; > ERROR: Left hand side of UNION ALL must be a non-recursive term in a > recursive query > > If we're going to allow non-linear recursion (which the standard > does not), this seems like it should be a valid case. I will try to disallow this. > * Strange result with except: > > with recursive foo(i) as > (values (1) > union all > select * from > (select i+1 from foo where i < 10 > except > select i+1 from foo where i < 5) t) > select * from foo; > ERROR: table "foo" has 0 columns available but 1 columns specified > > This query works if you replace "except" with "union". This should be > fixed for 8.4. I will try to fix this. > * Aggregates allowed: > > with recursive foo(i) as > (values(1) > union all > select max(i)+1 from foo where i < 10) > select * from foo; > > Aggregates should be blocked according to the standard. > Also, causes an infinite loop. This should be fixed for 8.4. I will try to fix this. > * DISTINCT should supress duplicates: > > with recursive foo(i) as > (select distinct * from (values(1),(2)) t > union all > select distinct i+1 from foo where i < 10) > select * from foo; > > This outputs a lot of duplicates, but they should be supressed > according to the standard. This query is essentially the same as > supporting UNION for recursive queries, so we should either fix both for > 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. > * outer joins on a recursive reference should be blocked: > > with recursive foo(i) as > (values(1) > union all > s
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Thanks for the review. [I dropped [EMAIL PROTECTED] from the Cc list since he has left our company and the email address is being deleted.] I'm going to look into issues which are seem to be bug (of course if you know what to fix, patches are always welcome:-). > These are my initial comments about the Common Table Expressions (CTE) > patch, also known as WITH [RECURSIVE]. These comments are based on the > patch here: > > http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php > > This is a semantically complex feature, and the standard is fairly > complex as well. So I'm approaching this by making my own > interpretations from the standard first (I included my interpretations > and section references at the end of this email) and comparing to the > behavior of the patch. > > The following examples may be inconsistent with the standard. Some > have already been mentioned, and I don't think they all need to be > fixed for 8.4, but I mention them here for completeness. > > * Mutual Recursion: > > with recursive > foo(i) as (values(1) union all select i+1 from bar where i < 10), > bar(i) as (values(1) union all select i+1 from foo where i < 10) > select * from foo; > ERROR: mutual recursive call is not supported > > The standard allows mutual recursion. The discussion seems to agree that let it leave for post 8.4. > * Single Evaluation: > > with > foo(i) as (select random() as i) > select * from foo union all select * from foo; >i > --- >0.233165248762816 > 0.62126633618027 > (2 rows) > > The standard specifies that non-recursive WITH should be evaluated > once. What shall we do? I don't think there's a easy way to fix this. Maybe we should not allow WITH clause without RECURISVE? > * RHS only: > > with recursive > foo(i) as (select i+1 from foo where i < 10 union all values(1)) > select * from foo; > ERROR: Left hand side of UNION ALL must be a non-recursive term in a > recursive query > > The standard does not require that the recursive term be on the RHS. The discussion seems to agree that let it leave for post 8.4. > * UNION ALL only: > > with recursive > foo(i) as (values(1) union select i+1 from foo where i < 10) > select * from foo; > ERROR: non-recursive term and recursive term must be combined with > UNION ALL > > The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT > (when the recursive term is not on the RHS of the EXCEPT). The discussion seems to agree that let it leave for post 8.4. > * Binary recursion and subselect strangeness: > > with recursive foo(i) as > (values (1) > union all > select * from > (select i+1 from foo where i < 10 > union all > select i+1 from foo where i < X) t) > select * from foo; > > Produces 10 rows of output regardless of what "X" is. This should be > fixed for 8.4. > Also, this is non-linear recursion, which the standard seems to > disallow. I will try to fix this. However detecting the query being not a non-linear one is not so easy. > * Multiple recursive references: > > with recursive foo(i) as > (values (1) > union all > select i+1 from foo where i < 10 > union all > select i+1 from foo where i < 20) > select * from foo; > ERROR: Left hand side of UNION ALL must be a non-recursive term in a > recursive query > > If we're going to allow non-linear recursion (which the standard > does not), this seems like it should be a valid case. I will try to disallow this. > * Strange result with except: > > with recursive foo(i) as > (values (1) > union all > select * from > (select i+1 from foo where i < 10 > except > select i+1 from foo where i < 5) t) > select * from foo; > ERROR: table "foo" has 0 columns available but 1 columns specified > > This query works if you replace "except" with "union". This should be > fixed for 8.4. I will try to fix this. > * Aggregates allowed: > > with recursive foo(i) as > (values(1) > union all > select max(i)+1 from foo where i < 10) > select * from foo; > > Aggregates should be blocked according to the standard. > Also, causes an infinite loop. This should be fixed for 8.4. I will try to fix this. > * DISTINCT should supress duplicates: > > with recursive foo(i) as > (select distinct * from (values(1),(2)) t > union all > select distinct i+1 from foo where i < 10) > select * from foo; > > This outputs a lot of duplicates, but they should be supressed > according to the standard. This query is essentially the same as > supporting UNION for recursive queries, so we should either fix both for > 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. > * outer joins on a recursive reference should be blocked: > > with recursive foo(i) as > (values(1) > union all > select i+1 from foo left jo
Re: [HACKERS] SQL standard question about Common Table Expressions
On Tue, 2008-09-09 at 01:45 +0100, Andrew Gierth wrote: > The "contains" language in the spec is tricky. And I think there's some > issue here with the spec confusing and expression body>; some of the language refers to "If a > AQEk not marked as recursive is immediately contained in a expression body>" which is not possible as the syntax production for > does not contain . Now that you point that out, I think that is a mistake in the spec. Your version makes a lot more sense. Thank you for going into the gory details of the algorithm, there were a few other things tripping me up that you clarified. I was going crazy yesterday trying to piece together what is supposed to happen for, e.g., using INTERSECT to connect the recursive and the non-recursive parts, and this answers it. I'm not suggesting that we require support for INTERSECT, but my curiosity got the best of me. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
On Mon, 8 Sep 2008 10:32:40 -0400 Alvaro Herrera <[EMAIL PROTECTED]> wrote: > Gregory Stark wrote: > > "Greg Sabino Mullane" <[EMAIL PROTECTED]> writes: > > > > > Tom Lane wrote: > > >> My vote is to reject the patch and work on a config checker. > > > > > > +1 +1 Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
Marko Kreen wrote: Thirdly, please don't use "standard units" argument, unless you plan to propose use of "KiB, MiB, GiB" at the same moment. In defense of standard units, if the postgres docs say "Postgres will round up to the nearest power of 2" kB and MB seem very clear to me. If we want to silently accept other abbreviations and acronyms too (mb, mIb, whatever) that doesn't bother me, tho. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [EMAIL PROTECTED]: Re: [HACKERS] [patch] GUC source file and line number]
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > > Hmm, I didn't recheck after Greg's patch, but in mine, it doesn't, > > because the location is saved as "reset location" and restored when the > > variable is reset. It worked fine in all cases I tested. > > Hmm. Actually, why is there a need to save and restore at all? There > can certainly never be more than one recorded config-file location per > variable. What about saying that each variable has one and only one > filename/linenumber, but whether these are relevant to the current value > is determined by whether the current value's source is S_FILE? Hmm, this does make the patch a lot simpler. Attached. (Magnus was visionary enough to put the correct test in the pg_settings definition.) I also dropped the change to set_config_option, and added a new routine to set the source file/line, as you suggested elsewhere. The only problem is that we now have two bsearch calls for each option set in a config file ... I don't want to change set_config_option just to be able to return the struct config_generic for this routine's sake ... Better ideas? Is this OK as is? I noticed some weird things when the config files contain errors, but I think it's outside this patch's scope. (I dropped the "default" stuff for now, as it doesn't seem that a consensus has been reached on that topic.) -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support Index: doc/src/sgml/catalogs.sgml === RCS file: /home/alvherre/Code/cvs/pgsql/doc/src/sgml/catalogs.sgml,v retrieving revision 2.172 diff -c -p -r2.172 catalogs.sgml *** doc/src/sgml/catalogs.sgml 30 Jul 2008 17:05:04 - 2.172 --- doc/src/sgml/catalogs.sgml 9 Sep 2008 02:42:14 - *** *** 6414,6419 --- 6414,6433 Allowed values in enum parameters (NULL for non-enum values) + + sourcefile + text + Input file the current value was set from (NULL for values set in + sources other than configuration files). Helpful when using + configuration include directives. + + + sourceline + text + Line number within the sourcefile the current value was set + from (NULL for values set in sources other than configuration files) + + Index: src/backend/utils/misc/guc-file.l === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/utils/misc/guc-file.l,v retrieving revision 1.56 diff -c -p -r1.56 guc-file.l *** src/backend/utils/misc/guc-file.l 22 Aug 2008 00:20:40 - 1.56 --- src/backend/utils/misc/guc-file.l 9 Sep 2008 02:09:28 - *** struct name_value_pair *** 39,44 --- 39,46 { char *name; char *value; + char *filename; + int sourceline; struct name_value_pair *next; }; *** ProcessConfigFile(GucContext context) *** 307,314 /* If we got here all the options checked out okay, so apply them. */ for (item = head; item; item = item->next) { ! set_config_option(item->name, item->value, context, ! PGC_S_FILE, GUC_ACTION_SET, true); } /* Remember when we last successfully loaded the config file. */ --- 309,320 /* If we got here all the options checked out okay, so apply them. */ for (item = head; item; item = item->next) { ! if (set_config_option(item->name, item->value, context, ! PGC_S_FILE, GUC_ACTION_SET, true)) ! { ! set_config_sourcefile(item->name, item->filename, ! item->sourceline); ! } } /* Remember when we last successfully loaded the config file. */ *** ParseConfigFile(const char *config_file, *** 483,488 --- 489,496 pfree(item->value); item->name = opt_name; item->value = opt_value; + item->filename = pstrdup(config_file); + item->sourceline = ConfigFileLineno-1; } else { *** ParseConfigFile(const char *config_file, *** 490,495 --- 498,505 item = palloc(sizeof *item); item->name = opt_name; item->value = opt_value; + item->filename = pstrdup(config_file); + item->sourceline = ConfigFileLineno-1; item->next = *head_p; *head_p = item; if (*tail_p == NULL) *** ParseConfigFile(const char *config_file, *** 502,507 --- 512,519 item = palloc(sizeof *item); item->name = opt_name; item->value = opt_value; + item->filename = pstrdup(config_file); + item->sourceline = ConfigFileLineno-1; item->next = NULL; if (*head_p == NULL) *head_p = item; *** free_name_value_list(struct name_value_p *** 553,558 --- 565,571 pfree(item->name); pfree(item->value); + pfree(
[HACKERS] [Patch Review] Copy column storage parameters on CREATE TABLE LIKE/INHERITS
Greetings, I've reviewed the patch here: http://archives.postgresql.org/message-id/[EMAIL PROTECTED] and am happy to report that it looks to be in good order. It has documentation and regression updates, is in context diff format, patches against current CVS with only some minor fuzz, and appears to work as advertised. I looked over the patch and could easily follow what was going on, did some additional testing beyond the regression tests, and reviewed the documentation changes. My only comment on this patch is that the documentation updates might include a link back to Section 53.2 for more information, and/or to the SET STORAGE portion of the alter table/alter column command documentation, since the only other 'storage' reference in create table's documentation is about table storage parameters. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] sql2008 diff sql2003
> "Alvaro" == Alvaro Herrera <[EMAIL PROTECTED]> writes: >> so it's like this: >> >> select * from foo order by bar offset 5 rows fetch first 10 rows only; Alvaro> Oh, I see -- it's just a cumbersome way to have our LIMIT Alvaro> clause. What's the "ONLY" for? It seems to be just a mandatory noise word. -- Andrew. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql2008 diff sql2003
Andrew Gierth wrote: > Alvaro> This means we have to support stuff like > > Alvaro> declare foo cursor for select * from lists; > Alvaro> select * from (fetch first from foo) as bar; > > No, that's wrong. [...] > so it's like this: > > select * from foo order by bar offset 5 rows fetch first 10 rows only; Oh, I see -- it's just a cumbersome way to have our LIMIT clause. What's the "ONLY" for? > (nothing that I can see assigns any semantics to FIRST vs NEXT, they seem > to do the same thing) I was actually thinking that you'd be able to open a cursor and then invoke the query repeatedly. With FETCH FIRST it wouldn't work, because every repetition would get the same rows, but with FETCH NEXT it'd give you a paginated query. It seems a lot less useful this way. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 48' 55.3", W 73º 15' 24.7" "I dream about dreams about dreams", sang the nightingale under the pale moon (Sandman) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL standard question about Common Table Expressions
> "Jeff" == Jeff Davis <[EMAIL PROTECTED]> writes: Jeff> I am looking into the SQL standard to try to determine Jeff> precisely how the CTE feature should behave. Jeff> Taking a simple case like: Jeff> with recursive Jeff> foo(i) as Jeff> (values(1) Jeff> union all Jeff> select i+1 from foo where i < 5) Jeff> select * from foo; Jeff> And looking at the SQL standard 200n 7.13: General Rules: 2.c, Jeff> it provides an algorithm for evaluating the recursive query. Jeff> In this algorithm, AQEk is a . Syntactically, Jeff> I only see two s, and one is the entire Jeff> query. The other is: "(values(1) union all select i+1 from foo Jeff> where i < 5)", so I'll assume that AQEk must be equal to that*. The "contains" language in the spec is tricky. And I think there's some issue here with the spec confusing and ; some of the language refers to "If a AQEk not marked as recursive is immediately contained in a " which is not possible as the syntax production for does not contain . Working through the spec, we have here only one WLE and therefore only one partition. So WLE1 is the only element, and WQN1 is "foo" and WQE1 is "(values(1) union all select i+1 from foo where i < 5)". The way I suspect it's _intended_ to work is like this (following the order of steps in the spec): AQE1 = "values(1) union all select i+1 from foo where i < 5" AQE2 = "values (1)" AQE3 = "select i+1 from foo where i < 5" AQE1 and AQE3 are recursive but AQE2 is not. AQE1 is associated with WQE1 and therefore RT1 and WT1 are the result and working tables for WQN1. AQE2 is not marked as recursive and is contained in a recursive union, therefore it is marked iteration-ignorable. None of the AQEn suppress duplicates. Let RT2 and WT2 be the result of AQE2, so both now contain (1) AQE2 now becomes "TABLE RTN2" (and hence AQE1/WQE1 are now "TABLE RTN2 union all select i+1 from foo where i < 5" Evaluate WQE1: RT1 and WT1 become (1) RT2/WT2 unchanged RT3 and WT3 remain empty AQE2 is iteration ignorable, so RT2 is emptied. WT1 and WT2 are nonempty, so: associate "foo" with WT1 evaluate WQE1: WT1 becomes (2) WT2 is empty WT3 becomes (2) RT1 becomes (RT1 UNION ALL WT1) i.e. (1),(2) RT3 becomes (RT3 UNION ALL WT3) i.e. (1),(2) WT1 and WT2 are nonempty, so: associate "foo" with WT1 evaluate WQE1: WT1 becomes (3) WT2 is empty WT3 becomes (3) RT1 becomes (RT1 UNION ALL WT1) i.e. (1),(2),(3) RT3 becomes (RT3 UNION ALL WT3) i.e. (1),(2),(3) etc. [The main differences between this and the logic used by the existing patch are the fact that the spec's version is keeping around many more working and result tables in order to do the duplicate-suppression logic, which requires knowing not just whether you've already produced a given row, but also _which branch of the query_ produced it. Within the constraints of the patch, it's not possible to trigger the duplicate elimination, and therefore RT2/WT2 and RT3/WT3 become redundant; also all non-recursive clauses are iteration-ignorable.] -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbosity of Function Return Type Checks
Volkan YAZICI wrote: > On Fri, 05 Sep 2008, Tom Lane <[EMAIL PROTECTED]> writes: > > at the call sites, and then > > > > errmsg("%s", _(msg)) > > > > when throwing the error. > > Modified as you suggested. BTW, will there be a similar i18n scenario > for "dropped column" you mentioned below? Yes, you need _() around those too. > Done with format_type_be() usage. BTW you forgot to remove the quotes around the type names (I know I told you to add them but Tom gave the reasons why it's not needed) :-) Those are minor problems that are easily fixed. However there's a larger issue that Tom mentioned earlier and I concur, which is that the caller is forming the primary error message and passing it down. It gets a bit silly if you consider the ways the messages end up worded: errmsg("returned record type does not match expected record type")); errdetail("Returned type \"%s\" does not match expected type \"%s\" in column \"%s\".", ---> this is the case where it's OK errmsg("wrong record type supplied in RETURN NEXT")); errdetail("Returned type \"%s\" does not match expected type \"%s\" in column \"%s\".", --> this is strange errmsg("returned tuple structure does not match table of trigger event")); errdetail("Returned type \"%s\" does not match expected type \"%s\" in column \"%s\".", --> this is not OK I've been thinking how to pass down the context information without feeding the complete string, but I don't find a way without doing message construction. Construction is to be avoided because it's a pain for translators. Maybe we should just use something generic like errmsg("mismatching record type") and have the caller pass two strings specifying what's the "returned" tuple and what's the "expected", but I don't see how ... (BTW this is worth fixing, because every case seems to have appeared independently without much thought as to other callsites with the same pattern.) -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing statistics write overhead
Martin Pihlak escribió: > Tom Lane wrote: > > Hmm. With the timestamp in the file, ISTM that we could put all the > > intelligence on the reader side. Reader checks file, sends message if > Attached is a patch that implements the described signalling. Are we keeping the idea that the reader sends a stat message when it needs to read stats? What about the lossiness of the transport? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
Markus Wanner wrote: > Hi, > > Bruce Momjian wrote: > > Backends would > > update a shared memory variable specifying how far they want the wal > > streamer to advance and send a signal to the wal streamer if necessary. > > Backends would monitor another shared memory variable that specifies how > > far the wal streamer has advanced. > > That sounds like WAL needs to be written to disk, before it can be sent > to the standby. Except maybe with some sort of mmap'ing the WAL. Well, WAL is either on disk or in the wal_buffers in shared memory --- in either case, a WAL streamer can get to it. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
Hi, Bruce Momjian wrote: Backends would update a shared memory variable specifying how far they want the wal streamer to advance and send a signal to the wal streamer if necessary. Backends would monitor another shared memory variable that specifies how far the wal streamer has advanced. That sounds like WAL needs to be written to disk, before it can be sent to the standby. Except maybe with some sort of mmap'ing the WAL. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move src/tools/backend/ to wiki
Alvaro Herrera wrote: > So I just noticed that we have a description of the Pg innards in the > sourcecode, complete with a flowchart and all, at src/tools/backend. > I had already seen this graph years ago; what shocked me the most was > finding out that there's a pointer to it in the Developer's FAQ, in a > question that's absolutely unrelated to it. > > So I went ahead and moved its mention to a separate question, where it > has a lot more visibility. (I also added an URL to anoncvs, where > people can see it more readily.) > > What I'm wondering right now is what's the value of having this stuff > in the source code at all. Why don't we move it to the Wiki and remove > it from the repo? Would anybody object to that idea? Having it on CVS > means that nobody is able to improve the text, which is kind of sad > because it's been neglected for the last 3 years. I have no problem having it moved to the wiki, though it does have HTML imagemap elements. I don't think much changes at the flow chart level from release to release so it would fine if it was just CVS HEAD. I also don't think many people do back-branch development. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Zdenek Kotala wrote: > You mentioned data types, but it is not a problem. You can easily extend data > type attribute about version information and call correct in/out functions. > Or > use different Oid for new data type version. There are more possible easy > solutions for data types. And for conversion You can use ALTER TABLE command. > Main idea is keep data in all format in a relation. This approach should use > also for integer/float datetime problem. This kind of code structure scares me that our system will become so complex that it will hinder our ability to continue making improvements. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Mon, 2008-09-08 at 22:53 +0100, Andrew Gierth wrote: > Yes, you've misinterpreted. When the spec says that a query "shall > not" do such-and-such, it means that a conforming client isn't allowed > to do that, it does NOT mean that the server is required to produce an > error when it sees it. > Interesting. Thanks for the clear reference. So, we either need to reject it or define some implementation-dependent behavior, right? The patch currently rejects HAVING, which means that it's a little difficult to use an aggregate effectively. I lean toward rejecting aggregates if we reject HAVING, for consistency. If we allow it, we should allow HAVING as well. > Jeff> I agree that's kind of a funny requirement. But that's pretty > Jeff> typical of the SQL standard. If DB2 or SQL Server follow the > Jeff> standard here, we should, too. If not, it's open for discussion. > > MSSQL does not: Thanks again for a reference. If MSSQL rejects SELECT DISTINCT for a recursive , then I think we should reject it. Right now the patch allows it but provides a result that is inconsistent with the standard. If we reject SELECT DISTINCT for recursive queries now, we can always meet the standard later, or decide that the standard behavior is too likely to cause confusion and just continue to block it. > Ideally we should. DB2 appears to (other than OFFSET which it doesn't > seem to have at all). But it's not at all clear that it would be either > useful or easy to do so. Ok. If it's easy to support it should probably be done. As an aside, you seem to be an expert with the standard, have you had a chance to look at the question I asked earlier?: http://archives.postgresql.org/pgsql-hackers/2008-09/msg00487.php Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
> "Jeff" == Jeff Davis <[EMAIL PROTECTED]> writes: Jeff> Aggregates should be blocked according to the standard. Jeff> Also, causes an infinite loop. This should be fixed for 8.4. >> Does the standard require anywhere that non-conforming statements >> must be diagnosed? (seems impractical, since it would forbid >> extensions) Jeff> 2.g.iii.4.B explicitly says aggregates should be rejected, Jeff> unless I have misinterpreted. Yes, you've misinterpreted. When the spec says that a query "shall not" do such-and-such, it means that a conforming client isn't allowed to do that, it does NOT mean that the server is required to produce an error when it sees it. Chapter and verse on this is given in the Framework doc, at 6.3.3.2: In the Syntax Rules, the term "shall" defines conditions that are required to be true of syntactically conforming SQL language. When such conditions depend on the contents of one or more schemas, they are required to be true just before the actions specified by the General Rules are performed. The treatment of language that does not conform to the SQL Formats and Syntax Rules is implementation-dependent. If any condition required by Syntax Rules is not satisfied when the evaluation of Access or General Rules is attempted and the implementation is neither processing non-conforming SQL language nor processing conforming SQL language in a non-conforming manner, then an exception condition is raised: "syntax error or access rule violation". Including an aggregate violates a "shall" in a syntax rule, therefore the query is non-conforming, therefore the server can either process it in an implementation-dependent manner or reject it with an exception. >> Yeah, though the standard's use of DISTINCT in this way is something >> of a violation of the POLA. Jeff> I agree that's kind of a funny requirement. But that's pretty Jeff> typical of the SQL standard. If DB2 or SQL Server follow the Jeff> standard here, we should, too. If not, it's open for discussion. MSSQL does not: "The following items are not allowed in the CTE_query_definition of a recursive member: * SELECT DISTINCT * GROUP BY * HAVING * Scalar aggregation * TOP * LEFT, RIGHT, OUTER JOIN (INNER JOIN is allowed) * Subqueries * A hint applied to a recursive reference to a CTE inside a CTE_query_definition. " For DB2 the docs do not appear to specify either way; they don't seem to forbid the use of SELECT DISTINCT inside a recursive CTE, but neither do they seem to mention any unusual effect of including it. Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursive Jeff> queries. The standard does not seem to say that these should be Jeff> rejected. >> Note that supporting those in subqueries (including CTEs) is a >> separate optional feature of the standard. Jeff> I don't feel strongly about this either way, but I prefer that we Jeff> are consistent when possible. We do support these things in a Jeff> subquery, so shouldn't we support them in all subqueries? Ideally we should. DB2 appears to (other than OFFSET which it doesn't seem to have at all). But it's not at all clear that it would be either useful or easy to do so. -- Andrew. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
Fujii Masao wrote: > On Mon, Sep 8, 2008 at 8:44 PM, Markus Wanner <[EMAIL PROTECTED]> wrote: > >>Merge into WAL writer? > > > > Uh.. that would mean you'd loose parallelism between WAL writing to disk and > > WAL shipping via network. That does not sound appealing to me. > > That depends on the order of WAL writing and WAL shipping. > How about the following order? > > 1. A backend writes WAL to disk. > 2. The backend wakes up WAL sender process and sleeps. > 3. WAL sender process does WAL shipping and wakes up the backend. > 4. The backend issues sync command. I am confused why this is considered so complicated. Having individual backends doing the wal transfer to the slave is never going to work well. I figured we would have a single WAL streamer that continues advancing forward in the WAL file, streaming to the standby. Backends would update a shared memory variable specifying how far they want the wal streamer to advance and send a signal to the wal streamer if necessary. Backends would monitor another shared memory variable that specifies how far the wal streamer has advanced. -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing statistics write overhead
Tom Lane wrote: > Hmm. With the timestamp in the file, ISTM that we could put all the > intelligence on the reader side. Reader checks file, sends message if ... snip ... > remember the file timestamp it last wrote out. There are various ways > you could design it but what comes to mind here is for readers to send > a timestamp defined as minimum acceptable file timestamp (ie, their > current clock time less whatever slop they want to allow). Then, > when the collector gets a request with that timestamp <= its last > write timestamp, it knows it need not do anything. Attached is a patch that implements the described signalling. Additionally following non-related changes have been made: 1. fopen/fclose replaced with AllocateFile/FreeFile 2. pgstat_report_stat() now also checks if there are functions to report before returning. Previous behavior was to return immediately if no table stats were available for reporting. > responding to a new live write request. It's sort of annoying that > the readers have to sleep for an arbitrary interval though. If we could > get rid of that part... > It is, but I guess its unavoidable if we have to wait for the file to be written. I'll try to do some load testing tomorrow, to see if something nasty comes out. regards, Martin Index: src/backend/postmaster/pgstat.c === RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.181 diff -c -r1.181 pgstat.c *** src/backend/postmaster/pgstat.c 25 Aug 2008 18:55:43 - 1.181 --- src/backend/postmaster/pgstat.c 8 Sep 2008 21:00:17 - *** *** 85,90 --- 85,97 #define PGSTAT_SELECT_TIMEOUT 2 /* How often to check for postmaster * death; in seconds. */ + #define PGSTAT_POLL_RETRY_DELAY 5 /* How long to pause between statistics + * update requests; in milliseconds. */ + + #define PGSTAT_POLL_TIME 5000 /* How long are we allowed to poll for + * the stats file; in milliseconds. */ + + #define PGSTAT_POLL_LOOP_COUNT (PGSTAT_POLL_TIME / PGSTAT_POLL_RETRY_DELAY) /* -- * The initial size hints for the hash tables used in the collector. *** *** 159,164 --- 166,177 static HTAB *pgStatFunctions = NULL; /* + * Indicates if backend has some function stats which it hasn't yet + * sent to the collector. + */ + static bool have_function_stats = false; + + /* * Tuple insertion/deletion counts for an open transaction can't be propagated * into PgStat_TableStatus counters until we know if it is going to commit * or abort. Hence, we keep these counts in per-subxact structs that live *** *** 201,208 */ static PgStat_GlobalStats globalStats; static volatile bool need_exit = false; - static volatile bool need_statwrite = false; static volatile bool got_SIGHUP = false; /* --- 214,225 */ static PgStat_GlobalStats globalStats; + /* Last time the collector wrote the stats file */ + static TimestampTz last_statwrite; + /* Latest statistics request from backend */ + static TimestampTz last_statrequest; + static volatile bool need_exit = false; static volatile bool got_SIGHUP = false; /* *** *** 223,229 NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]); static void pgstat_exit(SIGNAL_ARGS); - static void force_statwrite(SIGNAL_ARGS); static void pgstat_beshutdown_hook(int code, Datum arg); static void pgstat_sighup_handler(SIGNAL_ARGS); --- 240,245 *** *** 254,259 --- 270,276 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len); static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len); static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len); + static void pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len); /* *** *** 655,663 int i; /* Don't expend a clock check if nothing to do */ ! /* Note: we ignore pending function stats in this test ... OK? */ ! if (pgStatTabList == NULL || ! pgStatTabList->tsa_used == 0) return; /* --- 672,679 int i; /* Don't expend a clock check if nothing to do */ ! if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) ! && !have_function_stats) return; /* *** *** 823,828 --- 839,846 if (msg.m_nentries > 0) pgstat_send(&msg, offsetof(PgStat_MsgFuncstat, m_entry[0]) + msg.m_nentries * sizeof(PgStat_FunctionEntry)); + + have_function_stats = false; } *** *** 1341,1346 --- 1359,1367 fs->f_numcalls++; fs->f_time = f_total; INSTR_TIME_ADD(fs->f_time_self, f_self); + + /* indicate that we have something to upload */ + have_function_stats = true; } *** *** 2463,2468 --- 2484,2505 hdr->
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Mon, 2008-09-08 at 18:08 +0100, Andrew Gierth wrote: > Jeff> * Mutual Recursion: > > This limitation isn't at all uncommon in other implementations; DB2 > docs for example say: As with some other things in my list, this doesn't need to be supported in 8.4. I just wanted to lay out my interpretation of the standard, and places that we might (currently) fall short of it. The fact that other DBMSs don't support mutual recursion is a good indication that it's not important immediately. > Jeff> The standard does not require that the recursive term be on > Jeff> the RHS. > > Again, the standard may not, but existing implementations do: > Again, I don't think we need this for 8.4. However, I think it's probably more important than mutual recursion. > Jeff> * UNION ALL only: > > Jeff> with recursive > Jeff> foo(i) as (values(1) union select i+1 from foo where i < 10) > Jeff> select * from foo; > Jeff> ERROR: non-recursive term and recursive term must be combined with > Jeff> UNION ALL > > Jeff> The standard seems to allow UNION ALL, UNION, INTERSECT, and > Jeff> EXCEPT (when the recursive term is not on the RHS of the > Jeff> EXCEPT). > > Again, existing implementations disagree. See above for DB2, and for > MSSQL: > And again, I agree that it's not important for 8.4. At some point we need to determine what the goalposts are though. Are we copying existing implementations, or are we implementing the standard? > Jeff> Produces 10 rows of output regardless of what "X" is. This > Jeff> should be fixed for 8.4. Also, this is non-linear recursion, > Jeff> which the standard seems to disallow. > > That looks like it should be disallowed somehow. Agreed. I think it should just throw an error, probably. > [snip * Strange result with except: which looks like a bug] > > Jeff> * Aggregates allowed: which > > Jeff> with recursive foo(i) as > Jeff> (values(1) > Jeff> union all > Jeff> select max(i)+1 from foo where i < 10) > Jeff> select * from foo; > > Jeff> Aggregates should be blocked according to the standard. > Jeff> Also, causes an infinite loop. This should be fixed for 8.4. > > Does the standard require anywhere that non-conforming statements must > be diagnosed? (seems impractical, since it would forbid extensions) > 2.g.iii.4.B explicitly says aggregates should be rejected, unless I have misinterpreted. > > Yeah, though the standard's use of DISTINCT in this way is something > of a violation of the POLA. > I agree that's kind of a funny requirement. But that's pretty typical of the SQL standard. If DB2 or SQL Server follow the standard here, we should, too. If not, it's open for discussion. > No. This has already been discussed; it's neither possible nor desirable > to diagnose all cases which can result in infinite loops, and there are > important types of queries which would be unnecessarily forbidden. I didn't say we should forbid all infinite loops. But we should forbid ones that the standard tells us to forbid. > Besides, you've misread the spec here: it prohibits the recursive > reference ONLY on the nullable side of the join. You cite: > Thank you for the correction. It does properly reject the outer joins that the standard says should be rejected. > Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursive > Jeff> queries. The standard does not seem to say that these should be > Jeff> rejected. > > Note that supporting those in subqueries (including CTEs) is a separate > optional feature of the standard. > I don't feel strongly about this either way, but I prefer that we are consistent when possible. We do support these things in a subquery, so shouldn't we support them in all subqueries? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
On Mon, 2008-09-08 at 21:13 +0100, Gregory Stark wrote: > Jeff Davis <[EMAIL PROTECTED]> writes: > > > * Mutual Recursion: > > > > with recursive > > foo(i) as (values(1) union all select i+1 from bar where i < 10), > > bar(i) as (values(1) union all select i+1 from foo where i < 10) > > select * from foo; > > ERROR: mutual recursive call is not supported > > > > The standard allows mutual recursion. > > This seems to be a point of confusion. I originally read the standard and > concluded that mutual recursion was an optional feature. Itagaki-san showed me > a copy of the spec where it seemed there was a clear blanket prohibition on > mutually recursive queries and in fact anything but simple linearly expandable > queries. I wonder if there are different versions of the spec floating around > on this point. > > Take a second look at your spec and read on to where it defines "linear" and > "expandable". If it doesn't define those terms then it's definitely different > from what I read. If it does, read on to see what it does with them. The main > reason to define them appeared to be to use them to say that supporting mutual > recursion is not required. I think we're reading the same version of the spec. I'm reading 200n. My interpretation (Syntax Rule 2.h) is that expandable is only used to determine whether it can contain a . That being said, I don't think it should be a requirement for 8.4. The CTE patch is an important feature, and we shouldn't hold it up over something comparatively obscure like mutual recursion. As Andrew Gierth cited, other systems don't support it anyway. It should be kept in mind for future work though. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
Jeff Davis <[EMAIL PROTECTED]> writes: > * Mutual Recursion: > > with recursive > foo(i) as (values(1) union all select i+1 from bar where i < 10), > bar(i) as (values(1) union all select i+1 from foo where i < 10) > select * from foo; > ERROR: mutual recursive call is not supported > > The standard allows mutual recursion. This seems to be a point of confusion. I originally read the standard and concluded that mutual recursion was an optional feature. Itagaki-san showed me a copy of the spec where it seemed there was a clear blanket prohibition on mutually recursive queries and in fact anything but simple linearly expandable queries. I wonder if there are different versions of the spec floating around on this point. Take a second look at your spec and read on to where it defines "linear" and "expandable". If it doesn't define those terms then it's definitely different from what I read. If it does, read on to see what it does with them. The main reason to define them appeared to be to use them to say that supporting mutual recursion is not required. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] A few thoughts on the plan inval extension patch
I've been looking at http://archives.postgresql.org/message-id/[EMAIL PROTECTED] and am not very happy with it. One big problem is that it has the parser collecting the list of functions referenced by a plan, which is quite bogus --- consider functions introduced into the tree during rewrite, or by means of an inlining operation. The other thing I did not care for was adding an OID to SharedInvalCatcacheMsg. That means bloating the size of sinval messages from four words to five, which is not a good thing from a performance perspective. What would make a lot more sense from the perspective of the sinval code is to use the cache entry itempointer to identify the function (or other object) being invalidated. That's what the cache itself uses and what's already being shipped in inval messages. We would have to modify the call signature for syscache callback functions to pass them an ItemPointer, but that doesn't seem like a problem. The main downside of this approach is that some part of the planner would have to collect a list of TIDs of pg_proc entries for every function used in a plan, which means an additional syscache lookup (unless this could be piggybacked on some other existing lookup, but I see no very good candidate). The function is almost certainly in cache at this point, so it's not a huge penalty, but it's annoying. One idea that I had was to ameliorate that by not bothering to look up built-in functions (which for this purpose we could define as those with OIDs less than 1), on the assumption that they will never be dropped. Certainly the vast majority of functions referenced in a typical plan tree would be built-ins, and so this should get us down to the point where the extra lookups aren't a problem. The subsequent savings in list copying and comparisons in PlanCacheCallback are attractive too. Barring objections, I'll go see about whacking the patch into this form. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast REVERSE() function?
[EMAIL PROTECTED] (hubert depesz lubaczewski) writes: > On Mon, Sep 08, 2008 at 11:20:18AM -0400, Chris Browne wrote: >> I've got a case where I need to reverse strings, and find that, oddly >> enough, there isn't a C-based reverse() function. >> A search turns up pl/pgsql and SQL implementations: > > just for completenes - there is also pl/perl and c versions freely > available: > http://www.depesz.com/index.php/2007/07/30/indexable-field-like-something/ > (pl/perl) > http://blog.frosties.org/post/2007/08/28/Fonction-reverse-C-avec-PostgreSQL > (c) I hadn't thought about the Unicode issue (mentioned elsewhere in the thread); that's a good reason why the method I mentioned *wouldn't* be a good one! I'm NOT interested in pl/perl as an option; building and deploying all of Perl is a mighty expensive way to get *ONE* function (and I don't think that fundamentally changes if it's 10 functions!). In the long run, I'd be keen on there being a REVERSE function available in pg_catalog, which is why I'm asking about the C version, as that would be the way to put it into the "core." -- "cbbrowne","@","linuxdatabases.info" http://www3.sympatico.ca/cbbrowne/sap.html DSK: STAN.K; ML EXIT -- FILE NOT FOUND -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
Heikki Linnakangas wrote: > David Rowley wrote: > Thanks for all the reviews and suggestions. > David, could you re-run the performance tests you ran earlier? I'm > curious to know what impact switching to the simpler loop for 1-byte > pattern had. Sure: http://www.unixbeast.com/~fat/8.3_test_v1.4.xls I tested with 8.3.3, and applied the patch updated by tom. The thing that surprised me was that string_to_array didn't perform as well. I expected single character searches to perform a little better. I can't think why it would be slower now. The strpos() test for the single char needle is taking 1.2% longer. I guess that makes a little sense as we're now doing a new more length tests in this case, but then we've eliminated the final single strncmp() for once it finds that match. The strncmp would have check for nulls, check it's not exceeded the compare length and check the chars actually match. I figured this would have made up for that 1.2%. Seems not. David. -Original Message- From: Heikki Linnakangas [mailto:[EMAIL PROTECTED] Sent: 08 September 2008 08:50 To: David Rowley Cc: 'Tom Lane'; 'Peter Eisentraut'; pgsql-hackers@postgresql.org Subject: Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
Settings in postgresql.conf are currently case-insensitive. Except for the units. And, of course, filenames when you are using a case-sensitive filesystem. Because these are things that are defined by some convention other than the ones the PGDG made up. Since units fall into that category, it seems to me that we're stuck with using external conventions. Just a thought... since there are very few (if any) places where a user would specify a variable in terms of bits (kilobits, megabits, gigabits), we could make the units case-insensitive and assume that kb, gb, and mb (and all case variants) refer to some number of bytes. If a user wants to specify a variable in terms of bits, he would have to spell out the units completely, as in "4 gigabits", "20 kilobits", or "16 megabits". -- Korry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
On Mon, 2008-09-08 at 19:19 +0900, ITAGAKI Takahiro wrote: > Bruce Momjian <[EMAIL PROTECTED]> wrote: > > > > > b) Use new background process as WALSender > > > > > > > >This idea needs background-process hook which enables users > > > >to define new background processes > > > I think starting/stopping a process for each WAL send is too much > > overhead. > > Yes, of course slow. But I guess it is the only way to share one socket > in all backends. Postgres is not a multi-threaded architecture, > so each backend should use dedicated connections to send WAL buffers. > 300 backends require 300 connections for each slave... it's not good at all. So... don't have individual backends do the sending. Have them wait while somebody else does it for them. > > It sounds like Fujii-san is basically saying they can only get the hooks > > done for 8.4, not the actual solution. > > No! He has an actual solution in his prototype ;-) The usual thing if you have a WIP patch you're not sure of is to post the patch for feedback. If you guys aren't going to post any code to the project then I'm not clear why it's being discussed here. Is this a community project or a private project? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] For what should pg_stop_backup wait?
Simon Riggs <[EMAIL PROTECTED]> writes: >> So thinking we should test XLogArchiveCheckDone() for both >> stopxlogfilename and history file and then stat for the stop WAL file: > This seems better. Somehow I missed the 5-Apr patch that introduced this bogosity. Next time you make a fundamental change in the behavior of a function, how about updating its comment to match? After studying it for a minute, I think that XLogArchiveCheckDone no longer even has a clearly explainable purpose; it needs to be split into two functions with two different behaviors. I plan to revert XLogArchiveCheckDone to its previous behavior and introduce XLogArchiveIsBusy (flipping the sense of the result) to use in pg_stop_backup. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Review] pgbench duration option
On Mon, Sep 8, 2008 at 6:59 PM, ITAGAKI Takahiro <[EMAIL PROTECTED]> wrote: > Here is a revised version of the pgbench duration patch. > I merged some comments from Brendan and gnari. > The changes look good. I tried out the new v3 patch and didn't encounter any problems. One last minor quibble - I think the wording in the documentation is still a little bit awkward: In the first place, never believe any test that runs for only a few seconds. Use the -t or -T setting enough to make the run last at least a few minutes, so as to average out noise. This reads better IMHO if you simply omit the word "enough". Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] to_date() validation
On Mon, Sep 8, 2008 at 6:24 PM, Brendan Jurd <[EMAIL PROTECTED]> wrote: > On Sun, Sep 7, 2008 at 5:58 AM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: >> Code review: one minor nit >> *** a/src/backend/utils/adt/formatting.c >> --- b/src/backend/utils/adt/formatting.c >> *** >> *** 781,787 static const KeyWord DCH_keywords[] = { >>{"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN}, >> >>/* last */ >> ! {NULL, 0, 0, 0} >> }; >> >> /* -- >> --- 781,787 >>{"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN}, >> >>/* last */ >> ! {NULL, 0, 0, 0, 0} >> }; > > Ah, thank you for picking that up. I will correct this and send in a > new patch version in the next 24 hours. > New version attached (and link added to wiki). Cheers, BJ to-date-validation-2.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
> "Jeff" == Jeff Davis <[EMAIL PROTECTED]> writes: Jeff> * Mutual Recursion: This limitation isn't at all uncommon in other implementations; DB2 docs for example say: "If more than one common table expression is defined in the same statement, cyclic references between the common table expressions are not permitted. A cyclic reference occurs when two common table expressions dt1 and dt2 are created such that dt1 refers to dt2 and dt2 refers to dt1." http://publib.boulder.ibm.com/infocenter/iadthelp/v7r0/index.jsp?topic=/com.ibm.etools.iseries.langref2.doc/rbafzmst295.htm MSSQL's documentation is less clear, but it also appears not to allow mutual recursion (not allowing forward references to CTEs). "A CTE can reference itself and previously defined CTEs in the same WITH clause. Forward referencing is not allowed." http://msdn.microsoft.com/en-us/library/ms175972.aspx Jeff> * RHS only: Jeff> with recursive Jeff> foo(i) as (select i+1 from foo where i < 10 union all values(1)) Jeff> select * from foo; Jeff> ERROR: Left hand side of UNION ALL must be a non-recursive term in a Jeff> recursive query Jeff> The standard does not require that the recursive term be on Jeff> the RHS. Again, the standard may not, but existing implementations do: MSSQL: "The recursive CTE definition must contain at least two CTE query definitions, an anchor member and a recursive member. Multiple anchor members and recursive members can be defined; however, all anchor member query definitions must be put before the first recursive member definition. All CTE query definitions are anchor members unless they reference the CTE itself. " DB2: "The following restrictions apply to a recursive common-table-expression: * A list of column-names must be specified following the table-identifier. * The UNION ALL set operator must be specified. * The first fullselect of the first union (the initialization fullselect) must not include a reference to the common-table-expression itself in any FROM clause. " Jeff> * UNION ALL only: Jeff> with recursive Jeff> foo(i) as (values(1) union select i+1 from foo where i < 10) Jeff> select * from foo; Jeff> ERROR: non-recursive term and recursive term must be combined with Jeff> UNION ALL Jeff> The standard seems to allow UNION ALL, UNION, INTERSECT, and Jeff> EXCEPT (when the recursive term is not on the RHS of the Jeff> EXCEPT). Again, existing implementations disagree. See above for DB2, and for MSSQL: "Anchor members must be combined by one of these set operators: UNION ALL, UNION, INTERSECT, or EXCEPT. UNION ALL is the only set operator allowed between the last anchor member and first recursive member, and when combining multiple recursive members." Jeff> * Binary recursion and subselect strangeness: Jeff> with recursive foo(i) as Jeff> (values (1) Jeff> union all Jeff> select * from Jeff> (select i+1 from foo where i < 10 Jeff> union all Jeff> select i+1 from foo where i < X) t) Jeff> select * from foo; Jeff> Produces 10 rows of output regardless of what "X" is. This Jeff> should be fixed for 8.4. Also, this is non-linear recursion, Jeff> which the standard seems to disallow. That looks like it should be disallowed somehow. Jeff> * Multiple recursive references: [snip] Jeff> If we're going to allow non-linear recursion (which the Jeff> standard does not), this seems like it should be a valid Jeff> case. We probably shouldn't allow it (as above). [snip * Strange result with except: which looks like a bug] Jeff> * Aggregates allowed: which Jeff> with recursive foo(i) as Jeff> (values(1) Jeff> union all Jeff> select max(i)+1 from foo where i < 10) Jeff> select * from foo; Jeff> Aggregates should be blocked according to the standard. Jeff> Also, causes an infinite loop. This should be fixed for 8.4. Does the standard require anywhere that non-conforming statements must be diagnosed? (seems impractical, since it would forbid extensions) Jeff> * DISTINCT should supress duplicates: Jeff> with recursive foo(i) as Jeff> (select distinct * from (values(1),(2)) t Jeff> union all Jeff> select distinct i+1 from foo where i < 10) Jeff> select * from foo; Jeff> This outputs a lot of duplicates, but they should be Jeff> supressed according to the standard. This query is essentially Jeff> the same as supporting UNION for recursive queries, so we Jeff> should either fix both for 8.4 or block both for consistency. Yeah, though the standard's use of DISTINCT in this way is something of a violation of the POLA. Jeff> * outer joins on a recursive reference should be blocked: Jeff> with recursive foo(i) as Jeff> (values(1) Jeff> union all Jeff> select i+1 from foo left join (values(1)) t on (i=column1)) Jeff> select * from foo; Jeff> Causes an infinite loop, but the sta
Re: [HACKERS] sql2008 diff sql2003
> "Alvaro" == Alvaro Herrera <[EMAIL PROTECTED]> writes: Alvaro> Wow, this is really horrid: Alvaro> # F856 through F859: FETCH FIRST clause in subqueries, Alvaro> views, and query expressions. The SQL:2008 syntax for Alvaro> restricting the rows of a result set is FETCH FIRST, rather Alvaro> than Microsoft SQL Server’s SELECT TOP N equivalent which Alvaro> SQL Anywhere supports presently. Alvaro> This means we have to support stuff like Alvaro> declare foo cursor for select * from lists; Alvaro> select * from (fetch first from foo) as bar; No, that's wrong. The new syntax is: ::= [ ] [ ] [ ] [ ] ::= OFFSET { ROW | ROWS } ::= FETCH { FIRST | NEXT } [ ] { ROW | ROWS } ONLY so it's like this: select * from foo order by bar offset 5 rows fetch first 10 rows only; (nothing that I can see assigns any semantics to FIRST vs NEXT, they seem to do the same thing) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql2008 diff sql2003
Pavel Stehule escribió: > Hello > > I found one usefull article > http://iablog.sybase.com/paulley/2008/07/sql2008-now-an-approved-iso-international-standard/ Wow, this is really horrid: # F856 through F859: FETCH FIRST clause in subqueries, views, and query expressions. The SQL:2008 syntax for restricting the rows of a result set is FETCH FIRST, rather than Microsoft SQL Server’s SELECT TOP N equivalent which SQL Anywhere supports presently. This means we have to support stuff like declare foo cursor for select * from lists; select * from (fetch first from foo) as bar; (I wonder why didn't they use FETCH NEXT instead. As is, it seems a bit cumbersome to use.) -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] For what should pg_stop_backup wait?
Simon Riggs <[EMAIL PROTECTED]> writes: > You sound like you're in the middle of doing this yourself. Or would you > like me to do that? Yeah, done and committed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] sql2008 diff sql2003
Hello I found one usefull article http://iablog.sybase.com/paulley/2008/07/sql2008-now-an-approved-iso-international-standard/ Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast REVERSE() function?
2008/9/8 Andrew Dunstan <[EMAIL PROTECTED]>: > > > Mario Weilguni wrote: >>> >>> (Aside: presumably we could walk thru the string destructively, >>> in-place, swapping bytes; I think that would be theoretically >>> quickest...) >>> >> >> Hmmm... I guess it will not work für UTF-8 or any other multibyte charset >> > > Yes, quite. orafce contains multibyte (UTF8) reverse function. Pavel > > Perl's reverse might work with UTF8 - I've never tried. > > cheers > > andrew > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] For what should pg_stop_backup wait?
On Mon, 2008-09-08 at 11:57 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > >> So thinking we should test XLogArchiveCheckDone() for both > >> stopxlogfilename and history file and then stat for the stop WAL file: > > > This seems better. > > Somehow I missed the 5-Apr patch that introduced this bogosity. > Next time you make a fundamental change in the behavior of a function, > how about updating its comment to match? It felt OK when I did it, because of the clearly named boolean. But I accept your point and will look to improve my code comments in future. > After studying it for a minute, I think that XLogArchiveCheckDone no > longer even has a clearly explainable purpose; it needs to be split > into two functions with two different behaviors. I plan to revert > XLogArchiveCheckDone to its previous behavior and introduce > XLogArchiveIsBusy (flipping the sense of the result) to use in > pg_stop_backup. You sound like you're in the middle of doing this yourself. Or would you like me to do that? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast REVERSE() function?
Mario Weilguni wrote: (Aside: presumably we could walk thru the string destructively, in-place, swapping bytes; I think that would be theoretically quickest...) Hmmm... I guess it will not work für UTF-8 or any other multibyte charset Yes, quite. Perl's reverse might work with UTF8 - I've never tried. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast REVERSE() function?
Hello 2008/9/8 Mario Weilguni <[EMAIL PROTECTED]>: >> (Aside: presumably we could walk thru the string destructively, >> in-place, swapping bytes; I think that would be theoretically >> quickest...) > > Hmmm... I guess it will not work für UTF-8 or any other multibyte charset > it isn't problem, but I am not sure, if ANSI SQL has this function? Regards Pavel Stehule > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast REVERSE() function?
On Mon, Sep 08, 2008 at 11:20:18AM -0400, Chris Browne wrote: > I've got a case where I need to reverse strings, and find that, oddly > enough, there isn't a C-based reverse() function. > A search turns up pl/pgsql and SQL implementations: just for completenes - there is also pl/perl and c versions freely available: http://www.depesz.com/index.php/2007/07/30/indexable-field-like-something/ (pl/perl) http://blog.frosties.org/post/2007/08/28/Fonction-reverse-C-avec-PostgreSQL (c) Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: [EMAIL PROTECTED] / aim:depeszhdl / skype:depesz_hdl / gg:6749007 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast REVERSE() function?
> (Aside: presumably we could walk thru the string destructively, > in-place, swapping bytes; I think that would be theoretically > quickest...) Hmmm... I guess it will not work für UTF-8 or any other multibyte charset -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fast REVERSE() function?
I've got a case where I need to reverse strings, and find that, oddly enough, there isn't a C-based reverse() function. A search turns up pl/pgsql and SQL implementations: create or replace function reverse_string(text) returns text as $$ DECLARE reversed_string text; incoming alias for $1; BEGIN reversed_string = ; for i in reverse char_length(incoming)..1 loop reversed_string = reversed_string || substring(incoming from i for 1); end loop; return reversed_string; END $$ language plpgsql; CREATE OR REPLACE FUNCTION reverse(TEXT) RETURNS TEXT AS $$ SELECT array_to_string( ARRAY ( SELECT substring($1, s.i,1) FROM generate_series(length($1), 1, -1) AS s(i) ), ''); $$ LANGUAGE SQL IMMUTABLE; Unfortunately, neither is particularly fast. This should be "blinding-quick" in C, in comparison; reversing a set of bytes should be able to be done mighty quick! (Aside: presumably we could walk thru the string destructively, in-place, swapping bytes; I think that would be theoretically quickest...) I could probably add this in as an SPI() function; is there a good reason to try to avoid doing so? -- output = reverse("ofni.sesabatadxunil" "@" "enworbbc") http://www3.sympatico.ca/cbbrowne/sgml.html "Consistency is the single most important aspect of *ideology.* Reality is not nearly so consistent." - <[EMAIL PROTECTED]> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
Tom Lane <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> Alvaro Herrera <[EMAIL PROTECTED]> writes: >>> It's good as a joke, but what if the user says '1024b'? Does it mean >>> 1024 blocks or one kilobyte? If blocks, what size are we talking, the >>> usual 512 bytes, or our BLCKSZ? > >> For what guc would you find a unit of posix-style "blocks" relevant?! > > The point isn't whether it's relevant, the point is that there's a > fairly serious doubt as to what the user thought it meant. My point was that we don't accept any form of "b" today and nobody has suggested we should. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Common Table Expressions (WITH RECURSIVE) patch
These are my initial comments about the Common Table Expressions (CTE) patch, also known as WITH [RECURSIVE]. These comments are based on the patch here: http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php This is a semantically complex feature, and the standard is fairly complex as well. So I'm approaching this by making my own interpretations from the standard first (I included my interpretations and section references at the end of this email) and comparing to the behavior of the patch. The following examples may be inconsistent with the standard. Some have already been mentioned, and I don't think they all need to be fixed for 8.4, but I mention them here for completeness. * Mutual Recursion: with recursive foo(i) as (values(1) union all select i+1 from bar where i < 10), bar(i) as (values(1) union all select i+1 from foo where i < 10) select * from foo; ERROR: mutual recursive call is not supported The standard allows mutual recursion. * Single Evaluation: with foo(i) as (select random() as i) select * from foo union all select * from foo; i --- 0.233165248762816 0.62126633618027 (2 rows) The standard specifies that non-recursive WITH should be evaluated once. * RHS only: with recursive foo(i) as (select i+1 from foo where i < 10 union all values(1)) select * from foo; ERROR: Left hand side of UNION ALL must be a non-recursive term in a recursive query The standard does not require that the recursive term be on the RHS. * UNION ALL only: with recursive foo(i) as (values(1) union select i+1 from foo where i < 10) select * from foo; ERROR: non-recursive term and recursive term must be combined with UNION ALL The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT (when the recursive term is not on the RHS of the EXCEPT). * Binary recursion and subselect strangeness: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i < 10 union all select i+1 from foo where i < X) t) select * from foo; Produces 10 rows of output regardless of what "X" is. This should be fixed for 8.4. Also, this is non-linear recursion, which the standard seems to disallow. * Multiple recursive references: with recursive foo(i) as (values (1) union all select i+1 from foo where i < 10 union all select i+1 from foo where i < 20) select * from foo; ERROR: Left hand side of UNION ALL must be a non-recursive term in a recursive query If we're going to allow non-linear recursion (which the standard does not), this seems like it should be a valid case. * Strange result with except: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i < 10 except select i+1 from foo where i < 5) t) select * from foo; ERROR: table "foo" has 0 columns available but 1 columns specified This query works if you replace "except" with "union". This should be fixed for 8.4. * Aggregates allowed: with recursive foo(i) as (values(1) union all select max(i)+1 from foo where i < 10) select * from foo; Aggregates should be blocked according to the standard. Also, causes an infinite loop. This should be fixed for 8.4. * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i < 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. * outer joins on a recursive reference should be blocked: with recursive foo(i) as (values(1) union all select i+1 from foo left join (values(1)) t on (i=column1)) select * from foo; Causes an infinite loop, but the standard says using an outer join in this situation should be prohibited. This should be fixed for 8.4. * ORDER BY, LIMIT, and OFFSET are rejected for recursive queries. The standard does not seem to say that these should be rejected. The following are my interpretations of relevant parts of the SQL standard (200n), and the associated sections. These are only my interpretation, so let me know if you interpret the standard differently. Non-linear recursion forbidden: 7.13: Syntax Rules: 2.g.ii My interpretation of 2.g.ii.2 is that WQN[k] and WQN[l] may be the same . 7.13: Syntax Rules: 2.g.iv EXCEPT can't be used for recursive queries if a recursive reference appears on the RHS: 7.13: Syntax Rules: 2.g.iii.1 INTERSECT ALL/EXCEPT ALL can't be used for recursive queries: 7.13: Syntax Rules: 2.g.iii.5 recursive references must appear in FROM clause: 7.13: Syntax Rules: 2.g.iii.3 My interpretation of this rule is that it does not allow a
Re: [HACKERS] Synchronous Log Shipping Replication
Hi, Fujii Masao wrote: 1. A backend writes WAL to disk. 2. The backend wakes up WAL sender process and sleeps. 3. WAL sender process does WAL shipping and wakes up the backend. 4. The backend issues sync command. Right, that would work. But still, the WAL writer process would block during writing WAL blocks. Are there compelling reasons for using the existing WAL writer process, as opposed to introducing a new process? The timing of the process's receiving a signal is dependent on the scheduler of kernel. Sure, so are pipes or shmem queues. The scheduler does not always handle a signal immediately. What exactly are you proposing to use instead of signals? Semaphores are pretty inconvenient when trying to wake up arbitrary processes or in conjunction with listening on sockets via select(), for example. See src/backend/replication/manager.c from Postgres-R for a working implementation of such a process using select() and signaling. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
Gregory Stark <[EMAIL PROTECTED]> writes: > Alvaro Herrera <[EMAIL PROTECTED]> writes: >> It's good as a joke, but what if the user says '1024b'? Does it mean >> 1024 blocks or one kilobyte? If blocks, what size are we talking, the >> usual 512 bytes, or our BLCKSZ? > For what guc would you find a unit of posix-style "blocks" relevant?! The point isn't whether it's relevant, the point is that there's a fairly serious doubt as to what the user thought it meant. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
Alvaro Herrera <[EMAIL PROTECTED]> writes: > It's good as a joke, but what if the user says '1024b'? Does it mean > 1024 blocks or one kilobyte? If blocks, what size are we talking, the > usual 512 bytes, or our BLCKSZ? For what guc would you find a unit of posix-style "blocks" relevant?! -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
Gregory Stark wrote: > "Greg Sabino Mullane" <[EMAIL PROTECTED]> writes: > > > Tom Lane wrote: > >> My vote is to reject the patch and work on a config checker. > > > > +1 > > > >> postgres=# set work_mem = '1g'; > >> ERROR: invalid value for parameter "work_mem": "1g" > > > > Perhaps this would be a great place for a HINT listing all > > valid inputs, if not there already? > > It is, I paraphrased it on my original message as: > > HINT: It's perfectly clear what you want but I'm going to refuse to do >it until you type it exactly as I say: "GB" It's good as a joke, but what if the user says '1024b'? Does it mean 1024 blocks or one kilobyte? If blocks, what size are we talking, the usual 512 bytes, or our BLCKSZ? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
"Greg Sabino Mullane" <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> My vote is to reject the patch and work on a config checker. > > +1 > >> postgres=# set work_mem = '1g'; >> ERROR: invalid value for parameter "work_mem": "1g" > > Perhaps this would be a great place for a HINT listing all > valid inputs, if not there already? It is, I paraphrased it on my original message as: HINT: It's perfectly clear what you want but I'm going to refuse to do it until you type it exactly as I say: "GB" -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
Greg Sabino Mullane wrote: > Tom Lane wrote: > > postgres=# set work_mem = '1g'; > > ERROR: invalid value for parameter "work_mem": "1g" > > Perhaps this would be a great place for a HINT listing all > valid inputs, if not there already? alvherre=# set work_mem = '1g'; ERROR: invalid value for parameter "work_mem": "1g" HINT: Valid units for this parameter are "kB", "MB", and "GB". -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Tom Lane wrote: > My vote is to reject the patch and work on a config checker. +1 > postgres=# set work_mem = '1g'; > ERROR: invalid value for parameter "work_mem": "1g" Perhaps this would be a great place for a HINT listing all valid inputs, if not there already? - -- Greg Sabino Mullane [EMAIL PROTECTED] End Point Corporation PGP Key: 0x14964AC8 200809081014 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkjFM2AACgkQvJuQZxSWSsiDvACdE6Wlrnu3uQH4mOpuEMvX0VQe rXoAoPLCR5jKTWQH4GsHDtz5NNZXq4vA =nRMS -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
"Greg Stark" <[EMAIL PROTECTED]> writes: > On Mon, Sep 8, 2008 at 2:11 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> But of course case insensitivity isn't going to fix that example for you. >> So we're right back at the question of where we should draw the line in >> trying to accept variant input. > Well it's not a perfect precedent but for example, dd accepts: > G(2^30) > M(2^20) > k (2^10) > K(2^10) > Kb (10^3) > MB (10^6) > GB (10^9) > b(512) Hmm. I could get behind a proposal to allow single-letter abbreviations if it could be made to work across the board, but what about the time units? "ms" vs "min" is the sticky part there... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing statistics write overhead
Martin Pihlak <[EMAIL PROTECTED]> writes: > Attached is a patch which adds a timestamp to pgstat.stat file header, > backend_read_statsfile uses this to determine if the file is fresh. > During the wait loop, the stats request message is retransmitted to > compensate for possible loss of message(s). > The collector only writes the file mostly at PGSTAT_STAT_INTERVAL frequency, > currently no extra custom timeouts can be passed in the message. This can > of course be added if need arises. Hmm. With the timestamp in the file, ISTM that we could put all the intelligence on the reader side. Reader checks file, sends message if it's too stale. The collector just writes when it's told to, no filtering. In this design, rate limiting relies on the readers to not be unreasonable in how old a file they'll accept; and there's no problem with different readers having different requirements. A possible problem with this idea is that two readers might send request messages at about the same time, resulting in two writes where there need have been only one. However I think that could be fixed if we add a little more information to the request messages and have the collector remember the file timestamp it last wrote out. There are various ways you could design it but what comes to mind here is for readers to send a timestamp defined as minimum acceptable file timestamp (ie, their current clock time less whatever slop they want to allow). Then, when the collector gets a request with that timestamp <= its last write timestamp, it knows it need not do anything. With signaling like that, there's no excess writes *and* no delay in responding to a new live write request. It's sort of annoying that the readers have to sleep for an arbitrary interval though. If we could get rid of that part... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
On Mon, Sep 8, 2008 at 8:44 PM, Markus Wanner <[EMAIL PROTECTED]> wrote: >>Merge into WAL writer? > > Uh.. that would mean you'd loose parallelism between WAL writing to disk and > WAL shipping via network. That does not sound appealing to me. That depends on the order of WAL writing and WAL shipping. How about the following order? 1. A backend writes WAL to disk. 2. The backend wakes up WAL sender process and sleeps. 3. WAL sender process does WAL shipping and wakes up the backend. 4. The backend issues sync command. >> I guess we could invent a new semaphore-like primitive at the same layer >> as LWLocks using spinlock and PGPROC directly... > > Sure, but in what way would that differ from what I do with imessages? Performance ;) The timing of the process's receiving a signal is dependent on the scheduler of kernel. The scheduler does not always handle a signal immediately. Regards -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Mon, Sep 08, 2008 at 02:18:55PM +0100, Greg Stark wrote: > On Mon, Sep 8, 2008 at 2:11 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > > But of course case insensitivity isn't going to fix that example for you. > > So we're right back at the question of where we should draw the line in > > trying to accept variant input. > > Well it's not a perfect precedent but for example, dd accepts: > > G(2^30) > M(2^20) > k (2^10) > K(2^10) > Kb (10^3) > MB (10^6) > GB (10^9) > b(512) Heh. But it doesn't take Gb or gB or gb, which was one of your express wishes. So -- hm. Regards - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFIxSxlBcgs9XrR2kYRAkOkAJ9pp7nNjgJOb2NtHPwKIKZMcsNYlQCdE8wa VQ/c+9Nan1V3d+/cdTm+Xn4= =rkzg -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Tom Lane napsal(a): Zdenek Kotala <[EMAIL PROTECTED]> writes: Another idea is to create backward compatible AM and put them into separate library. If these AM will work also with old page structure then there should not be reason for reindexing or index page conversion after upgrade. I don't think that'd be real workable. It would require duplicating all the entries for that AM in pg_opfamily, pg_amop, etc. Which we could do for the built-in entries, I suppose, but what happens to user-defined operator classes? When catalog upgrade will be performed directly, user-defined op classes should stay in the catalog. But question is what's happen with regproc records and if all functions will be compatible with a new server ... It invokes idea that we need stable API for operator and data types implementation. All datatype which will use only this API, then can be used on new PostgreSQL versions without recompilation. At least for the index changes proposed so far for 8.4, it seems to me that the best solution is to mark affected indexes as not "indisvalid" and require a post-conversion REINDEX to fix 'em. Obviously a better solution would be nice later, but we have to avoid putting huge amounts of work into noncritical problems, else the whole feature is just not going to get finished. Agree. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
On Mon, Sep 8, 2008 at 2:11 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > But of course case insensitivity isn't going to fix that example for you. > So we're right back at the question of where we should draw the line in > trying to accept variant input. Well it's not a perfect precedent but for example, dd accepts: G(2^30) M(2^20) k (2^10) K(2^10) Kb (10^3) MB (10^6) GB (10^9) b(512) I think we're all agreed we want to ignore the KiB crap and make all our units base 2. And I don't think usin "b" for block makes sense for us. But the point is that yes, people expect to type "100M" or "1G" and have that work. Plenty of us do it all the time with dd or other tools already. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing statistics write overhead
Tom Lane wrote: > Timestamp within the file is certainly better than trying to rely on > filesystem timestamps. I doubt 1 sec resolution is good enough, and > I'd also be worried about issues like clock skew between the > postmaster's time and the filesystem's time. > Attached is a patch which adds a timestamp to pgstat.stat file header, backend_read_statsfile uses this to determine if the file is fresh. During the wait loop, the stats request message is retransmitted to compensate for possible loss of message(s). The collector only writes the file mostly at PGSTAT_STAT_INTERVAL frequency, currently no extra custom timeouts can be passed in the message. This can of course be added if need arises. regards, Martin Index: src/backend/postmaster/pgstat.c === RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.181 diff -c -r1.181 pgstat.c *** src/backend/postmaster/pgstat.c 25 Aug 2008 18:55:43 - 1.181 --- src/backend/postmaster/pgstat.c 8 Sep 2008 12:17:21 - *** *** 85,90 --- 85,94 #define PGSTAT_SELECT_TIMEOUT 2 /* How often to check for postmaster * death; in seconds. */ + #define PGSTAT_POLL_RETRY_DELAY 10 /* How long to pause between statistics + * update requests; in milliseconds. */ + + #define PGSTAT_POLL_RETRIES 500 /* How many times to repeat stats inquiry */ /* -- * The initial size hints for the hash tables used in the collector. *** *** 201,206 --- 205,215 */ static PgStat_GlobalStats globalStats; + /* Last time the collector wrote the stats file */ + static TimestampTz last_statwrite; + /* Last time a backend requested a new file */ + static TimestampTz last_statrequest; + static volatile bool need_exit = false; static volatile bool need_statwrite = false; static volatile bool got_SIGHUP = false; *** *** 223,229 NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]); static void pgstat_exit(SIGNAL_ARGS); - static void force_statwrite(SIGNAL_ARGS); static void pgstat_beshutdown_hook(int code, Datum arg); static void pgstat_sighup_handler(SIGNAL_ARGS); --- 232,237 *** *** 254,259 --- 262,268 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len); static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len); static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len); + static void pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len); /* *** *** 2463,2468 --- 2472,2491 hdr->m_type = mtype; } + /* -- + * pgstat_send_inquiry() - + * + * Notify collector that we need fresh data. + * -- + */ + static void + pgstat_send_inquiry(void) + { + PgStat_MsgInquiry msg; + + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY); + pgstat_send(&msg, sizeof(msg)); + } /* -- * pgstat_send() - *** *** 2538,2545 NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]) { - struct itimerval write_timeout; - bool need_timer = false; int len; PgStat_Msg msg; --- 2561,2566 *** *** 2571,2583 /* * Ignore all signals usually bound to some action in the postmaster, ! * except SIGQUIT and SIGALRM. */ pqsignal(SIGHUP, pgstat_sighup_handler); pqsignal(SIGINT, SIG_IGN); pqsignal(SIGTERM, SIG_IGN); pqsignal(SIGQUIT, pgstat_exit); - pqsignal(SIGALRM, force_statwrite); pqsignal(SIGPIPE, SIG_IGN); pqsignal(SIGUSR1, SIG_IGN); pqsignal(SIGUSR2, SIG_IGN); --- 2592,2603 /* * Ignore all signals usually bound to some action in the postmaster, ! * except SIGQUIT */ pqsignal(SIGHUP, pgstat_sighup_handler); pqsignal(SIGINT, SIG_IGN); pqsignal(SIGTERM, SIG_IGN); pqsignal(SIGQUIT, pgstat_exit); pqsignal(SIGPIPE, SIG_IGN); pqsignal(SIGUSR1, SIG_IGN); pqsignal(SIGUSR2, SIG_IGN); *** *** 2598,2608 */ need_statwrite = true; - /* Preset the delay between status file writes */ - MemSet(&write_timeout, 0, sizeof(struct itimerval)); - write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; - write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000; - /* * Read in an existing statistics stats file or initialize the stats to * zero. --- 2618,2623 *** *** 2649,2668 } /* ! * If time to write the stats file, do so. Note that the alarm ! * interrupt isn't re-enabled immediately, but only after we next ! * receive a stats message; so no cycles are wasted when there is ! * nothing going on. */ if (need_statwrite) { /* Check for postmaster death; if so we'll write file below */ if (!PostmasterIsAlive(true)) break; ! pgstat_
Re: [HACKERS] Some newbie questions
M2Y escribió: > On Sep 7, 11:52 pm, [EMAIL PROTECTED] (Shane Ambler) wrote: > > > What is a good way to start understanding backend(postgres) code? Is > > > there any documentation available especially for developers? > > > What is commit log and why it is needed? > > > > To achieve ACID (Atomic, Consistent, Isolatable, Durable) > > The changes needed to complete a transaction are saved to the commit log > > and flushed to disk, then the data files are changed. If the power goes > > out during the data file modifications the commit log can be used to > > complete the changes without losing any data. > > This, I think, is transaction log or XLog. My question is about CLog > in which two bits are there for each transaction which will denote the > status of transaction. Since there is XLog from which we can determine > what changes we have to redo and undo, what is the need for this CLog. That's correct -- what Shane is describing is the transaction log (usually know here as WAL). However, this xlog is write-only (except in the case of a crash); clog is read-write, and must be fast to query since it's used very frequently to determine visibility of each tuple. Perhaps what you need to read is the chapter on our MVCC implementation, which relies heavily on clog. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
Gregory Stark <[EMAIL PROTECTED]> writes: > I'm all for using the correct acronyms in all messages and documentation. What > I find annoying is the: > postgres=# set work_mem = '1g'; > ERROR: invalid value for parameter "work_mem": "1g" But of course case insensitivity isn't going to fix that example for you. So we're right back at the question of where we should draw the line in trying to accept variant input. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Heikki Linnakangas napsal(a): Zdenek Kotala wrote: Heikki Linnakangas napsal(a): Relation forks didn't change anything inside relation files, so no scanning of relations is required because of that. Neither will the FSM rewrite. Not sure about DSM yet. Does it mean, that if you "inject" old data file after catalog upgrade, then FSM will works without any problem? Yes. You'll need to construct an FSM, but it doesn't necessarily need to reflect the reality. You could just fill it with zeros, meaning that there's no free space anywhere, and let the next vacuum fill it with real information. Or you could read the old pg_fsm.cache file and fill the new FSM accordingly. I think zeroed FSM is good, because new items should not be added on to old page. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing statistics write overhead
Tom Lane wrote: > Magnus Hagander <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> I'd also be worried about issues like clock skew between the >>> postmaster's time and the filesystem's time. > >> Can that even happen on a local filesystem? I guess you could put the >> file on NFS though, but that seems to be.. eh. sub-optimal.. in more >> than one way.. > > Lots of people use NAS/SAN type setups. For NAS it could happen, but certainly not for SAN, no? SANs have all the filesystem processing in the kernel of the server... I still agree that it's not a good thing to rely on, though :-) //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Heikki Linnakangas napsal(a): Zdenek Kotala wrote: Tom Lane napsal(a): Heikki Linnakangas <[EMAIL PROTECTED]> writes: In fact, I don't think there's any low-level data format changes yet between 8.3 and 8.4, so this would be a comparatively easy release to implement upgrade-in-place. There's just the catalog changes, but AFAICS nothing that would require scanning through relations. After a quick scan of the catversion.h changelog (which hopefully covers any such changes): we changed sequences incompatibly, we changed hash indexes incompatibly (even without the pending patch that would change their contents beyond recognition), and Teodor did some stuff to GIN indexes that might or might not represent an on-disk format change, you'd have to ask him. We also whacked around the sort order of bpchar_pattern_ops btree indexes. Hmm, It seems that reindex is only good answer on all these changes. Isn't that exactly what we want to avoid with upgrade-in-place? As long as the conversion can be done page-at-a-time, without consulting other pages, we can do it when the page is read in. Yes, but I meant what we can do for 8.4. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
Simon Riggs <[EMAIL PROTECTED]> writes: > Peter's objection is reasonable, as far as most people have replied. > Marko's proposal is also reasonable to most people, since they do not > wish fat fingers to cause any amount of downtime. ISTM that if you've > done this, you appreciate the feature, if not it seems less important. My objection isn't down-time at all, it's the insultingly user-hostile attitude. I normally am setting work_mem by hand in a psql session and *every* time I do it I swear at Postgres for being annoyingly pedantic here. I'm all for using the correct acronyms in all messages and documentation. What I find annoying is the: postgres=# set work_mem = '1g'; ERROR: invalid value for parameter "work_mem": "1g" HINT: It's perfectly clear what you want but I'm going to refuse to do it until you type it exactly as I say: "GB" > * Marko should change patch to put WARNINGs in place so people know they > got it wrong That's only slightly less insulting than an error. > * we make sure the case is always shown correctly in all other aspects > of Postgres server and docs (no relaxation at all there) I believe we're already fairly stringent about this as we should be. > * in the longer term, we look for the solution to be a config checker I don't think a config checker directly addresses the same problem. I never set work_mem in a config and it still annoys the hell out of me. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Our CLUSTER implementation is pessimal
Gregory Stark <[EMAIL PROTECTED]> writes: > Yeah, I've been thinking about how to use the planner to do this. I thought the answer to that was going to be more or less "call cost_sort() and cost_index() and compare the answers". > To do that it seems to me what we would need to do is add a function > _pg_get_rawtuple_header() which returns the visibility information that HTSV > needs. You seem to be confusing "use the planner" with "use the executor". All that we need here is a decision about which code path to take within CLUSTER. We don't need to bring in boatloads of irrelevant infrastructure --- especially not infrastructure that's going to be fighting us every step of the way. The executor isn't designed to return raw tuples and no magic function is going to change that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
Simon Riggs <[EMAIL PROTECTED]> writes: > Peter's objection is reasonable, as far as most people have replied. > Marko's proposal is also reasonable to most people, since they do not > wish fat fingers to cause any amount of downtime. ISTM that if you've > done this, you appreciate the feature, if not it seems less important. I really think that the claim that this will "save downtime" is a ridiculous argument. On that basis we should, for example, be looking for a nearest match for any misspelled variable name. The fact of the matter is that a configuration validator is a far better answer to any such worries than trying to accept bad/questionable input. > So my recommendation to everybody is > * we allow case insensitive matches of units in postgresql.conf > * Marko should change patch to put WARNINGs in place so people know they > got it wrong > * we make sure the case is always shown correctly in all other aspects > of Postgres server and docs (no relaxation at all there) > * in the longer term, we look for the solution to be a config checker My vote is to reject the patch and work on a config checker. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > The bpchar_pattern_ops change you mentioned must be this one: >> A not-immediately-obvious incompatibility is that the sort order within >> bpchar_pattern_ops indexes changes --- it had been identical to plain >> strcmp, but is now trailing-blank-insensitive. This will impact >> in-place upgrades, if those ever happen. Yup. > The way I read that, bpchar_pattern_ops just became less sensitive. Some > values are now considered equal that weren't before, and thus can now be > stored in any order. That's not an incompatible change, right? No, consider 'abc^I' vs 'abc ' (^I denoting a tab character). These are unequal in either case, but the sort order has flipped. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Zdenek Kotala <[EMAIL PROTECTED]> writes: > Another idea is to create backward compatible AM and put them into separate > library. If these AM will work also with old page structure then there should > not be reason for reindexing or index page conversion after upgrade. I don't think that'd be real workable. It would require duplicating all the entries for that AM in pg_opfamily, pg_amop, etc. Which we could do for the built-in entries, I suppose, but what happens to user-defined operator classes? At least for the index changes proposed so far for 8.4, it seems to me that the best solution is to mark affected indexes as not "indisvalid" and require a post-conversion REINDEX to fix 'em. Obviously a better solution would be nice later, but we have to avoid putting huge amounts of work into noncritical problems, else the whole feature is just not going to get finished. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some newbie questions
M2Y <[EMAIL PROTECTED]> writes: > On Sep 7, 11:52 pm, [EMAIL PROTECTED] (Shane Ambler) wrote: >> Most of the developer info is within comments in the code itself. >> Another place to start ishttp://www.postgresql.org/developer/coding >> > I have seen this link. But, I am looking(or hoping) for any design doc > or technical doc which details what is happening under the hoods as it > will save a lot of time to catchup the main stream. Well, you should certainly not neglect http://developer.postgresql.org/pgdocs/postgres/internals.html Also note that many subtrees of the source code contain README files with assorted overview material. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing statistics write overhead
Magnus Hagander <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> I'd also be worried about issues like clock skew between the >> postmaster's time and the filesystem's time. > Can that even happen on a local filesystem? I guess you could put the > file on NFS though, but that seems to be.. eh. sub-optimal.. in more > than one way.. Lots of people use NAS/SAN type setups. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
Hi, ITAGAKI Takahiro wrote: 1. Is process-switching approach the best way to share one socket? Both Postgres-R and the log-shipping prototype use the approach now. Can I think there is no objection here? I don't see any appealing alternative. The postmaster certainly shouldn't need to worry with any such socket for replication. Threading falls pretty flat for Postgres. So the socket must be held by one of the child processes of the Postmaster. 2. If 1 is reasonable, how should we add a new WAL sender process? Just add a new process using a core-patch? Seems feasible to me, yes. Merge into WAL writer? Uh.. that would mean you'd loose parallelism between WAL writing to disk and WAL shipping via network. That does not sound appealing to me. Consider framework to add any of user-defined auxiliary process? What for? What do you miss in the existing framework? 3. If 1 is reasonable, what should we use for the process-switching primitive? Postgres-R uses signals and locking and the log-shipping prototype uses multi-threads and POSIX message queues now. AFAIK message queues are problematic WRT portability. At least Postgres doesn't currently use them and introducing dependencies on those might lead to problems, but I'm not sure. Others certainly know more about issues involved. A multi-threaded approach is certainly out of bounds, at least within the Postgres core code. Signals and locking is possible choice for 3, but I want to use better approach if any. Faster is always better. I think the approach can reach better throughput than POSIX message queues or unix pipes, because of the mentioned savings in copying around between system and application memory. However, that hasn't been proved, yet. I guess we could invent a new semaphore-like primitive at the same layer as LWLocks using spinlock and PGPROC directly... Sure, but in what way would that differ from what I do with imessages? Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [GENERAL] [HACKERS] New shapshot RPMs (Sep 7 2008) are ready for testing
Hi, On Sun, 2008-09-07 at 13:39 -0400, Andrew Dunstan wrote: > The point I was making is that for 8.4, unless you specifically > configure with --disable-integer-datetimes, it is enabled by default on > any platform that can support it. We committed that change on 30 March > here: http://archives.postgresql.org/pgsql-committers/2008-03/msg00550.php You are right, and I overlooked the actual macro. I now "fixed" the macro by changing its behavior to use the --disable-integer-datetimes mode if not defined. Cheers, -- Devrim GÜNDÜZ, RHCE devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Our CLUSTER implementation is pessimal
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Simon Riggs wrote: >> I think there needs to be an option to force this to do either sorts or >> indexscans. > > If we use the planner, "set enable_indexscan =off" or "set enable_sort=off" > ought to work. Yeah, I've been thinking about how to use the planner to do this. It seems to me it would be a much better solution because it would allow us to take advantage of other access paths as well (such as other indexes) and even new features that don't currently exist (index-only scans...). To do that it seems to me what we would need to do is add a function _pg_get_rawtuple_header() which returns the visibility information that HTSV needs. Then we need to construct an SPI query like SELECT _pg_get_rawtuple_header(), * FROM tab ORDER BY col1, col2, col3, ... For each tuple we'll have to deform it, and reform it using the new tuple descriptor and just the columns excluding the header and pass that to the heap rewrite module. Passing the header separately. Heap rewrite would have to call HTSV on just the header (with the same hack I put in for this patch to allow passing InvalidBuffer to HTSV). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
Markus Wanner <[EMAIL PROTECTED]> wrote: > ITAGAKI Takahiro wrote: > > Are there any better idea to share one socket connection between > > backends (and bgwriter)? > > I fear I'm repeating myself, but I've had the same problem for > Postgres-R and solved it with an internal message passing infrastructure > which I've simply called imessages. It requires only standard Postgres > shared memory, signals and locking and should thus be pretty portable. Imessage serves as a useful reference, but it is one of the detail parts of the issue. I can break down the issue into three parts: 1. Is process-switching approach the best way to share one socket? Both Postgres-R and the log-shipping prototype use the approach now. Can I think there is no objection here? 2. If 1 is reasonable, how should we add a new WAL sender process? Just add a new process using a core-patch? Merge into WAL writer? Consider framework to add any of user-defined auxiliary process? 3. If 1 is reasonable, what should we use for the process-switching primitive? Postgres-R uses signals and locking and the log-shipping prototype uses multi-threads and POSIX message queues now. Signals and locking is possible choice for 3, but I want to use better approach if any. Faster is always better. I guess we could invent a new semaphore-like primitive at the same layer as LWLocks using spinlock and PGPROC directly... Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Our CLUSTER implementation is pessimal
On Mon, 2008-09-08 at 13:52 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > I think there needs to be an option to force this to do either sorts or > > indexscans. > > If we use the planner, "set enable_indexscan =off" or "set > enable_sort=off" ought to work. Agreed - as long as that is explicitly in the docs. I'm wondering whether we should put a limit on size of each temp tablespace. This change will cause old admin jobs to break disks that aren't big enough for the new way of doing it. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Cleanup of GUC units code
On Wed, 2008-09-03 at 11:58 +0300, Asko Oja wrote: > On Wed, Sep 3, 2008 at 11:20 AM, Heikki Linnakangas > <[EMAIL PROTECTED]> wrote: > Marko Kreen wrote: > On 9/2/08, Peter Eisentraut <[EMAIL PROTECTED]> wrote: > Marko Kreen wrote: > In the meantime, here is simple patch > for case-insensivity. > You might be able to talk me into accepting > various unambiguous, common > alternative spellings of various units. But > for instance allowing MB and Mb > to mean the same thing is insane. > > How would the docs for that look like? And anyway, > what is wrong with > Mb for megabytes? > From infamous wikipedia: A megabit is a unit of information or > computer storage, abbreviated Mbit (or Mb). > To me playing with case of acronyms and even depending on it seems > more insane. It would make much more sense to have case insensitive > set of acronyms and (thanks Tom for pointing out) some sanity checks > when configuration is loaded to notify user when wrong ones are used > for some context. There is a patch on the CommitFest, so we must either accept it or reject it (with/without comments). Peter's objection is reasonable, as far as most people have replied. Marko's proposal is also reasonable to most people, since they do not wish fat fingers to cause any amount of downtime. ISTM that if you've done this, you appreciate the feature, if not it seems less important. I would point out that Marko's patch is about what values get accepted in postgresql.conf, not how the units are represented when you perform SHOW, look at pg_settings or read the docs. So Marko's proposal does not ignore the important distinction between units in *all* places, just at the time of input. With that in mind, the proposal seems to be acceptable to the majority, based upon my assessment of the comments. In terms of the patch, ISTM that relaxing the comparison logic also appears to relax the warnings given and that is not acceptable, given concerns raised. So my recommendation to everybody is * we allow case insensitive matches of units in postgresql.conf * Marko should change patch to put WARNINGs in place so people know they got it wrong * we make sure the case is always shown correctly in all other aspects of Postgres server and docs (no relaxation at all there) * in the longer term, we look for the solution to be a config checker -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Our CLUSTER implementation is pessimal
Simon Riggs wrote: I think there needs to be an option to force this to do either sorts or indexscans. If we use the planner, "set enable_indexscan =off" or "set enable_sort=off" ought to work. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
Hi, ITAGAKI Takahiro wrote: Are there any better idea to share one socket connection between backends (and bgwriter)? The connections could be established after fork() from postmaster, and number of them could be two or more. This is one of the most complicated part of synchronous log shipping. Switching-processes apporach like b) is just one idea for it. I fear I'm repeating myself, but I've had the same problem for Postgres-R and solved it with an internal message passing infrastructure which I've simply called imessages. It requires only standard Postgres shared memory, signals and locking and should thus be pretty portable. In simple benchmarks, it's not quite as efficient as unix pipes, but doesn't require as many file descriptors, is independent of the parent-child relations of processes, maintains message borders and it is more portable (I hope). It could certainly be improved WRT efficiency and could theoretically even beat Unix pipes, because it involves less copying of data and less syscalls. It has not been reviewed nor commented much. I'd still appreciate that. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
Bruce Momjian <[EMAIL PROTECTED]> wrote: > > > b) Use new background process as WALSender > > > > > >This idea needs background-process hook which enables users > > >to define new background processes > I think starting/stopping a process for each WAL send is too much > overhead. Yes, of course slow. But I guess it is the only way to share one socket in all backends. Postgres is not a multi-threaded architecture, so each backend should use dedicated connections to send WAL buffers. 300 backends require 300 connections for each slave... it's not good at all. > It sounds like Fujii-san is basically saying they can only get the hooks > done for 8.4, not the actual solution. No! He has an actual solution in his prototype ;-) It is very similar to b) and the overhead was not so bad. It's not so clean to be a part of postgres, though. Are there any better idea to share one socket connection between backends (and bgwriter)? The connections could be established after fork() from postmaster, and number of them could be two or more. This is one of the most complicated part of synchronous log shipping. Switching-processes apporach like b) is just one idea for it. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NDirectFileRead and Write
"Hitoshi Harada" <[EMAIL PROTECTED]> wrote: > so I guess all of these variables should be defined together but > actually you put the two in buffile.h while the others in > buf_internals.h. Is there clear reason for that? That's because buffile.c includes buffile.h, but not buf_internals.h . NDirectFileRead/Writes are counters for BufFile in the patch and don't depend on bufmgr. It might be cleaner if we had something like storage/bufstats.h and declared all counter variables in it, but it seems to be an overkill. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Zdenek Kotala wrote: Tom Lane napsal(a): Heikki Linnakangas <[EMAIL PROTECTED]> writes: In fact, I don't think there's any low-level data format changes yet between 8.3 and 8.4, so this would be a comparatively easy release to implement upgrade-in-place. There's just the catalog changes, but AFAICS nothing that would require scanning through relations. After a quick scan of the catversion.h changelog (which hopefully covers any such changes): we changed sequences incompatibly, we changed hash indexes incompatibly (even without the pending patch that would change their contents beyond recognition), and Teodor did some stuff to GIN indexes that might or might not represent an on-disk format change, you'd have to ask him. We also whacked around the sort order of bpchar_pattern_ops btree indexes. Hmm, It seems that reindex is only good answer on all these changes. Isn't that exactly what we want to avoid with upgrade-in-place? As long as the conversion can be done page-at-a-time, without consulting other pages, we can do it when the page is read in. I'm not sure what the GIN changes were, but I didn't see any changes to the page layout at a quick glance. The bpchar_pattern_ops change you mentioned must be this one: A not-immediately-obvious incompatibility is that the sort order within bpchar_pattern_ops indexes changes --- it had been identical to plain strcmp, but is now trailing-blank-insensitive. This will impact in-place upgrades, if those ever happen. The way I read that, bpchar_pattern_ops just became less sensitive. Some values are now considered equal that weren't before, and thus can now be stored in any order. That's not an incompatible change, right? Sequence should be converted during catalog conversion. Agreed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Zdenek Kotala wrote: Heikki Linnakangas napsal(a): Relation forks didn't change anything inside relation files, so no scanning of relations is required because of that. Neither will the FSM rewrite. Not sure about DSM yet. Does it mean, that if you "inject" old data file after catalog upgrade, then FSM will works without any problem? Yes. You'll need to construct an FSM, but it doesn't necessarily need to reflect the reality. You could just fill it with zeros, meaning that there's no free space anywhere, and let the next vacuum fill it with real information. Or you could read the old pg_fsm.cache file and fill the new FSM accordingly. PS: I plan to review FSM this week. Thanks! -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Review] pgbench duration option
Here is a revised version of the pgbench duration patch. I merged some comments from Brendan and gnari. "Brendan Jurd" <[EMAIL PROTECTED]> wrote: > >> Wouldn't this be better written as: > >> if ((duration > 0 && timer_exceeded) || st->cnt >= nxacts) > > > > sorry, but these do not lok as the same thing to me. > > gnari I used this condition expression here: if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center pgbench-duration_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Heikki Linnakangas napsal(a): Tom Lane wrote: I didn't see anything that looked like an immediate change in user table contents, unless they used the "name" type; but what of relation forks? Relation forks didn't change anything inside relation files, so no scanning of relations is required because of that. Neither will the FSM rewrite. Not sure about DSM yet. Does it mean, that if you "inject" old data file after catalog upgrade, then FSM will works without any problem? Zdenek PS: I plan to review FSM this week. -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Tom Lane napsal(a): Heikki Linnakangas <[EMAIL PROTECTED]> writes: In fact, I don't think there's any low-level data format changes yet between 8.3 and 8.4, so this would be a comparatively easy release to implement upgrade-in-place. There's just the catalog changes, but AFAICS nothing that would require scanning through relations. After a quick scan of the catversion.h changelog (which hopefully covers any such changes): we changed sequences incompatibly, we changed hash indexes incompatibly (even without the pending patch that would change their contents beyond recognition), and Teodor did some stuff to GIN indexes that might or might not represent an on-disk format change, you'd have to ask him. We also whacked around the sort order of bpchar_pattern_ops btree indexes. Hmm, It seems that reindex is only good answer on all these changes. Sequence should be converted during catalog conversion. Another idea is to create backward compatible AM and put them into separate library. If these AM will work also with old page structure then there should not be reason for reindexing or index page conversion after upgrade. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] to_date() validation
On Sun, Sep 7, 2008 at 5:58 AM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: > Im just following this: > http://wiki.postgresql.org/wiki/Reviewing_a_Patch so lets get started. > Hi Alex. Thanks for taking the time to review my patch. > Feature test: Everything seems to work as advertised. However before > via sscanf() most limited the max length of the input. Before they > were just silently ignored now they get added to the result: > > patched: > # SELECT to_timestamp('1', 'HHMM'); > to_timestamp > -- > 0009-03-19 11:00:00-06:59:56 > > 8.3.3: > # SELECT to_timestamp('1', 'HHMM'); > to_timestamp > --- > 0001-11-01 11:00:00-07 BC > It's an interesting point. I had considered what would happen if the input string was too short, but hadn't given much thought to what would happen if it was too long. In your example case, it isn't clear whether the user wanted to specify 11 hours and 11 months (and the final '1' is just junk), or if they really wanted to specify 11 hours and 111 months. But consider a case like the following: # SELECT to_date('07-09-20008', 'DD-MM-'); The documentation says that '' is "4 and more digits", so you have to assume that the user meant to say the year 20,008. That's why the code in the patch parses all the remaining digits in the input string on the final node. HEAD actually gets this one wrong; in defiance of the documentation it returns 2000-09-07. So, it seems to me that the patch shifts the behaviour in the right direction. Barring actually teaching the code that some nodes (like ) can take an open-ended number of characters, while others (like MM) must take an exact number of characters, I'm not sure what can be done to improve this. Perhaps something for a later patch? > Code review: one minor nit > *** a/src/backend/utils/adt/formatting.c > --- b/src/backend/utils/adt/formatting.c > *** > *** 781,787 static const KeyWord DCH_keywords[] = { >{"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN}, > >/* last */ > ! {NULL, 0, 0, 0} > }; > > /* -- > --- 781,787 >{"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN}, > >/* last */ > ! {NULL, 0, 0, 0, 0} > }; Ah, thank you for picking that up. I will correct this and send in a new patch version in the next 24 hours. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Bruce Momjian napsal(a): Heikki Linnakangas wrote: Bruce Momjian wrote: As far as the page not fitting after conversion, what about some user command that will convert an entire table to the new format if page expansion fails. VACUUM? Having to run a manual command defeats the purpose somewhat, though. Especially if you have no way of knowing on what tables it needs to be run on. My assumption is that the page not fitting would be a rare case so requiring something like vacuum to fix it would be OK. It is 1-2% records per heap. I assume that is is more for BTree. What I don't want to do it to add lots of complexity to the code just to handle the page expansion case, when such a case is rare and perhaps can be fixed by a vacuum. Unfortunately it is not so rare. And only heap on 32bit x86 platform (4byte Max alignment) is no problem. But all index pages are affected. In fact, I don't think there's any low-level data format changes yet between 8.3 and 8.4, so this would be a comparatively easy release to implement upgrade-in-place. There's just the catalog changes, but AFAICS nothing that would require scanning through relations. Yep. I did not test now but pg_upgrade.sh script worked fine in May without any modification for conversion 8.3->8.4. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Bruce Momjian napsal(a): As far as the page not fitting after conversion, what about some user command that will convert an entire table to the new format if page expansion fails. Keep in a mind that there are more kind of pages. Heap is easy, but each index AM has own specific :(. Better approach is move tuple to the new page and invalidate all related table indexes. Following reindex automatically convert whole table. After putting all those together, how large a patch are we talking about, and what's the performance penalty then? How much of all that needs to be in core, and how much can live in a pgfoundry project or an extra binary in src/bin or contrib? I realize that none of us have a crystal ball, and one has to start somewhere, but I feel uneasy committing to an approach until we have a full plan. Yes, another very good point. I am ready to focus on these issues for 8.4; all this needs to be fleshed out, perhaps on a wiki. As a starting point, what would be really nice is to start a wiki that lists all data format changes for every major release. As Greg mentioned in his mail there wiki page is already there. Unfortunately, I did not time to put actual information there. I'm going to do soon. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Prototype: In-place upgrade v02
Heikki Linnakangas napsal(a): Zdenek Kotala wrote: Heikki Linnakangas napsal(a): The patch seems to be missing the new htup.c file. I'm sorry. I attached new version which is synchronized with current head. I would like to say more comments as well. 1) The patch contains also changes which was discussed during July commit fest. - PageGetTempPage modification suggested by Tom - another hash.h backward compatible cleanup It might be a good idea to split that into a separate patch. The sheer size of this patch is quite daunting, even though the bulk of it is straightforward search&replace. Yes, I will do it. 2) I add tuplimits.h header file which contains tuple limits for different access method. It is not finished yet, but idea is to keep all limits in one file and easily add limits for different page layout version - for example replace static computing with dynamic based on relation (maxtuplesize could be store in pg_class for each relation). I need this header also because I fallen in a cycle in header dependency. 3) I already sent Page API performance result in http://archives.postgresql.org/pgsql-hackers/2008-08/msg00398.php I replaced call sequence PagetGetItemId, PageGetItemId with PageGetIndexTuple and PageGetHeapTuple function. It is main difference in this patch. PAgeGetHeap Tuple fills t_ver in HeapTuple to identify correct tupleheader version. It would be good to mention that PageAPI (and tuple API) implementation is only prototype without any performance optimization. You mentioned 5% performance degradation in that thread. What test case was that? What would be a worst-case scanario, and how bad is it? Paul van den Bogaart tested long run OLTP workload on it. He used iGen test. 5% is a pretty hefty price, especially when it's paid by not only upgraded installations, but also freshly initialized clusters. I think you'll need to pursue those performance optimizations. 5% is worst scenario. Current version is not optimized. It is written for easy debugging and (D)tracing. Pageheaders structures are very similar and we can easily remove switches for most of attributes and replace function with macros or inline function. 4) This patch contains more topics for decision. First is general if this approach is acceptable. I don't like the invasiveness of this approach. It's pretty invasive already, and ISTM you'll need similar switch-case handling of all data types that have changed the internal representation as well. I agree in general. But for example new page API is not so invasive and by my opinion it should be implemented (with or without multiversion support), because it cleans a code. HeapTuple processing is easy too, but unfortunately it requires lot of modifications on many places. I has wonder how many pieces of code access directly to HeapTupleHeader and does not use HeapTuple data structure. I think we should make a conclusion what is recommended usage of HeapTupleHeader and HeapTuple. Most of changes in a code is like replacing HeapTupleHeaderGetXmax(tuple->t_data) with HeapTupleGetXmax(tuple) and so on. I think it should be cleanup anyway. You mentioned data types, but it is not a problem. You can easily extend data type attribute about version information and call correct in/out functions. Or use different Oid for new data type version. There are more possible easy solutions for data types. And for conversion You can use ALTER TABLE command. Main idea is keep data in all format in a relation. This approach should use also for integer/float datetime problem. We've talked about this before, so you'll remember that I favor teh approach is to convert the page format, page at a time, when the pages are read in. I grant you that there's non-trivial issues with that as well, like if the converted data takes more space and don't fit in the page anymore. I like conversion on read too, because it is easy but there are more problems. The non-fit page is one them. Others problems are with indexes. For example hash index stores bitmap into page and it is not mentioned anywhere. Only hash am knows what page contains this kind of data. It is probably impossible to convert this page during a reading. :( I wonder if we could go with some sort of a hybrid approach? Convert the whole page when it's read in, but if it doesn't fit, fall back to tricks like loosening the alignment requirements on platforms that can handle non-aligned data, or support a special truncated page header, without pd_tli and pd_prune_xid fields. Just a thought, not sure how feasible those particular tricks are, but something along those lines.. OK, I have backup idea :-). Stay tuned :-) All in all, though. I find it a bit hard to see the big picture. For upgrade-in-place, what are all the pieces that we need? To keep this concrete, let's focus on PG 8.2 -> PG 8.3 (or are you focusing on PG 8.3 -> 8.4? That's fine with me as well, but le
Re: [HACKERS] Some newbie questions
Thanks Shane for your response... On Sep 7, 11:52 pm, [EMAIL PROTECTED] (Shane Ambler) wrote: > > What is a good way to start understanding backend(postgres) code? Is > > there any documentation available especially for developers? > > Most of the developer info is within comments in the code itself. > Another place to start ishttp://www.postgresql.org/developer/coding > I have seen this link. But, I am looking(or hoping) for any design doc or technical doc which details what is happening under the hoods as it will save a lot of time to catchup the main stream. > > What is commit log and why it is needed? > > To achieve ACID (Atomic, Consistent, Isolatable, Durable) > The changes needed to complete a transaction are saved to the commit log > and flushed to disk, then the data files are changed. If the power goes > out during the data file modifications the commit log can be used to > complete the changes without losing any data. This, I think, is transaction log or XLog. My question is about CLog in which two bits are there for each transaction which will denote the status of transaction. Since there is XLog from which we can determine what changes we have to redo and undo, what is the need for this CLog. > > > Why does a replication solution need log shipping and why cant we > > just ship the transaction statements to a standby node? > > Depends on what you wish to achieve. They are two ways to a similar > solution. > Log shipping is part of the core code with plans to make the duplicate > server be able to satisfy select queries. > Statement based replication is offered by other options such as slony. > > Each has advantages and disadvantages. Transaction logs are part of > normal operation and can be copied to another server in the background > without adding load or delays to the master server. > > Statement based replication has added complexity of waiting for the > slaves to duplicate the transaction and handling errors from a slave > applying the transaction. They also tend to have restrictions when it > comes to replicating DDL changes - implemented as triggers run from > INSERT/UPDATE not from CREATE/ALTER TABLE. I agree. Assuming that both master and backup are running same versions of the server and both are in sync, why cant we just send the command statements to standby in the main backend loop(before parsing) and let the standby ignore the SELECT kind of statements. I am a beginner ... plz forgive my ignorance and plz provide some clarity so that I can understand the system better. Thanks, Srinivas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing statistics write overhead
Tom Lane wrote: > Martin Pihlak <[EMAIL PROTECTED]> writes: >> I had also previously experimented with stat() based polling but ran into >> the same issues - no portable high resolution timestamp on files. I guess >> stat() is unusable unless we can live with 1 second update interval for the >> stats (eg. backend reads the file if it is within 1 second of the request). > >> One alternative is to include a timestamp in the stats file header - the >> backend can then wait on that -- check the timestamp, sleep, resend the >> request, loop. Not particularly elegant, but easy to implement. Would this >> be acceptable? > > Timestamp within the file is certainly better than trying to rely on > filesystem timestamps. I doubt 1 sec resolution is good enough, and We'd need half a second resolution just to keep up with the level we have *now*, don't we? > I'd also be worried about issues like clock skew between the > postmaster's time and the filesystem's time. Can that even happen on a local filesystem? I guess you could put the file on NFS though, but that seems to be.. eh. sub-optimal.. in more than one way.. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing statistics write overhead
Martin Pihlak wrote: > Magnus Hagander wrote: >> I wrote a patch for this some time back, that was actually applied. >> Turns out it didn't work, and I ran out of time to fix it, so it was >> backed out again. And then I forgot about it :-) If you look through the >> cvs history of pgstat you should be able to find it - maybe it can give >> you some further ideas. > > Got it - this was 1.126. Looks very familiar indeed :) :-) > I had also previously experimented with stat() based polling but ran into > the same issues - no portable high resolution timestamp on files. I guess > stat() is unusable unless we can live with 1 second update interval for the > stats (eg. backend reads the file if it is within 1 second of the request). If the filesystem has inodes, noticing that the inode changes should usually be enough, no? Since we always create a new file and rename it into place, there is no way that inode can be reused right away. But it does require that the stat info is not cached somewhere in between... //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
David Rowley wrote: Thanks for all the reviews and suggestions. David, could you re-run the performance tests you ran earlier? I'm curious to know what impact switching to the simpler loop for 1-byte pattern had. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers