[PATCHES] Caveat Caveat
I noticed that the Partitioning section of the docs has *two* sections of caveats in different places, but close together. One called caveats, one not. That looks like it just led to somebody not reading some appropriate caveats in the second group of caveats (on -admin). Doc patch only. Combines both sets of caveats into one section, still called same thing as previous first Caveat section. Passes SGML make -- Simon Riggs EnterpriseDB http://www.enterprisedb.com Index: doc/src/sgml/ddl.sgml === RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ddl.sgml,v retrieving revision 1.67 diff -c -r1.67 ddl.sgml *** doc/src/sgml/ddl.sgml 23 Oct 2006 18:10:30 - 1.67 --- doc/src/sgml/ddl.sgml 16 Nov 2006 17:43:51 - *** *** 2657,2692 /para /sect2 -sect2 id=ddl-partitioning-caveats -titleCaveats/title - -para - The following caveats apply to partitioned tables: -itemizedlist - listitem - para - There is currently no way to verify that all of the - literalCHECK/literal constraints are mutually - exclusive. Care is required by the database designer. - /para - /listitem - - listitem - para - There is currently no simple way to specify that rows must not be - inserted into the master table. A literalCHECK (false)/literal - constraint on the master table would be inherited by all child - tables, so that cannot be used for this purpose. One possibility is - to set up an literalON INSERT/ trigger on the master table that - always raises an error. (Alternatively, such a trigger could be - used to redirect the data into the proper child table, instead of - using a set of rules as suggested above.) - /para - /listitem -/itemizedlist -/para -/sect2 - sect2 id=ddl-partitioning-constraint-exclusion titlePartitioning and Constraint Exclusion/title --- 2657,2662 *** *** 2768,2776 a large part of the partition or just a small part. An index will be helpful in the latter case but not the former. /para para ! The following caveats apply: itemizedlist listitem --- 2738,2776 a large part of the partition or just a small part. An index will be helpful in the latter case but not the former. /para +/sect2 + +sect2 id=ddl-partitioning-caveats +titleCaveats/title + +para + The following caveats apply to partitioned tables: +itemizedlist + listitem + para + There is currently no way to verify that all of the + literalCHECK/literal constraints are mutually + exclusive. Care is required by the database designer. + /para + /listitem + + listitem + para + There is currently no simple way to specify that rows must not be + inserted into the master table. A literalCHECK (false)/literal + constraint on the master table would be inherited by all child + tables, so that cannot be used for this purpose. One possibility is + to set up an literalON INSERT/ trigger on the master table that + always raises an error. (Alternatively, such a trigger could be + used to redirect the data into the proper child table, instead of + using a set of rules as suggested above.) + /para + /listitem +/itemizedlist +/para para ! The following caveats apply to constraint exclusion: itemizedlist listitem ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] replication docs: split single vs. multi-master
Markus Schiltknecht wrote: Hi, as promised on -docs, here comes my proposal on how to improve the replication documentation. The patches are split as follows and have to be applied in order: replication_doku_1.diff: Smallest possible one-word change to warm-up... Done. replication_doku_2.diff: Moves down Clustering For Parallel Query Execution, because it's not a replication type, but a feature, see explanation below. Actually the patch moves down data paritioning. I am confused. replication_doku_3.diff: This is the most important part, splitting all replication types into single- and multi-master replication. I'm new to SGML, so please bear with me if this is not the right way to do it... Shared-Disk-Failover does IMO not fall into a replication category. Should we mention there, that 'sharing' a disk using NFS or some such is not recommended? (And more importantly, does not work as a multi-master replication solution) I've added a general paragraph describing Single-Master Replication. I'm stating that 'Single-Master Replication is always asynchronous'. Can anybody think of a counter example? Or a use case for sync Single-Master Replication? The argument to put down is: if you go sync, why don't you do Multi-Master right away? Most of the Clustering for Load Balancing text applies to all synchronous, Multi-Master Replication algorithms, even to Query Broadcasting. Thus it became the general description of Multi-Master Replication. The section Clustering for Load Balancing has been removed. I thought a long time about this. I have always liked splitting the solutions up into single and multi-master, but in doing this documentation section, I realized that the split isn't all that helpful, and can be confusing. For example, Slony is clearly single-master, but what about data partitioning? That is multi-master, in that there is more than one master, but only one master per data set. And for multi-master, Oracle RAC is clearly multi master, and I can see pgpool as multi-master, or as several single-master systems, in that they operate independently. After much thought, it seems that putting things into single/multi-master categories just adds more confusion, because several solutions just aren't clear, or fall into neither, e.g. Shared Disk Failover. Another issue is that you mentioned heavly locking for multi-master, when in fact pgpool doesn't do any special inter-server locking, so it just doesn't apply. In summary, it just seemed clearer to talk about each item and how it works, rather than try to categorize them. The categorization just seems to do more harm than good. Of course, I might be totally wrong, and am still looking for feedback, but these are my current thoughts. Feedback? I didn't mention distributed shared memory as a separate item because I felt it was an implementation detail of clustering, rather than something separate. I kept two-phase in the cluster item for the same reason. Current version at: http://momjian.us/main/writings/pgsql/sgml/failover.html -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] replication docs: split single vs. multi-master
Hello Bruce, Bruce Momjian wrote: Actually the patch moves down data paritioning. I am confused. Uh.. yeah, sorry, that's what I meant. I thought a long time about this. I have always liked splitting the solutions up into single and multi-master, but in doing this documentation section, I realized that the split isn't all that helpful, and can be confusing. Not mentioning that categorization doesn't help in clearing the confusion. Just look around, most people use these terms. They're used by MySQL and Oracle. Even Microsofts ActiveDirectory seems to have a multi-master operation mode. For example, Slony is clearly single-master, Agreed. but what about data partitioning? That is multi-master, in that there is more than one master, but only one master per data set. Data Partitioning is a way to work around the trouble of database replication in the application layer. Instead of trying to categorize it like a replication algorithm, we should explain that working around the trouble may be worthwhile in many cases. And for multi-master, Oracle RAC is clearly multi master, Yes. and I can see pgpool as multi-master, or as several single-master systems, in that they operate independently. Several single-master systems? C'mon! Pgpool simply implements the most simplistic form of multi-master replication. Just because you can access the single databases inside the cluster doesn't make it less Multi-Master, does it? After much thought, it seems that putting things into single/multi-master categories just adds more confusion, because several solutions just aren't clear Agreed, I'm not saying you must categorize all solutions you describe. But please do categorize the ones which can be (and have so often been) categorized. or fall into neither, e.g. Shared Disk Failover. Oh, yes, this reminds me of Brad Nicholson's suggestion in [1] to add a warning about the risk of having two postmaster come up What about other means of sharing disks or filesystems? NBDs or even worse: NFS? Another issue is that you mentioned heavly locking for multi-master, when in fact pgpool doesn't do any special inter-server locking, so it just doesn't apply. Sure it does apply, in the sense that *every* single lock is granted and released on *every* node. The total amount of locks scales linearly with the amount of nodes in the cluster. In summary, it just seemed clearer to talk about each item and how it works, rather than try to categorize them. The categorization just seems to do more harm than good. Of course, I might be totally wrong, and am still looking for feedback, but these are my current thoughts. Feedback? AFAICT, the categorization in Single- and Multi-Master replication is very common. I think that's partly because it's focused on the solution. One can ask: do I want to write on all nodes or is a failover solution sufficient? Or can I probably get away with a read-only Slave? It's a categorization the user does, often before having a glimpse about how complicated database replication really is. Thus, IMO, it would make sense to help the user and allow him to quickly find answers. (And we can still tell them that it's not easy or even possible to categorize all the solutions.) I didn't mention distributed shared memory as a separate item because I felt it was an implementation detail of clustering, rather than something separate. I kept two-phase in the cluster item for the same reason. Why is pgpool not an implementation detail of clustering, then? Current version at: http://momjian.us/main/writings/pgsql/sgml/failover.html That somehow doesn't work for me: --- momjian.us ping statistics --- 15 packets transmitted, 0 received, 100% packet loss, time 14011ms Just my 2 cents, in the hope to be of help. Regards Markus [1]: Brad Nicholson's suggestion: http://archives.postgresql.org/pgsql-admin/2006-11/msg00154.php ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] replication docs: split single vs. multi-master
Bruce Momjian wrote: I didn't mention distributed shared memory as a separate item because I felt it was an implementation detail of clustering, rather than something separate. I kept two-phase in the cluster item for the same reason. Current version at: http://momjian.us/main/writings/pgsql/sgml/failover.html I am now attaching the additional text I added based on your comments. I have also changed the markup so all the solutions appear on the same web page. I think seeing it all together might give us new ideas for improvement. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/failover.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/failover.sgml,v retrieving revision 1.6 diff -c -c -r1.6 failover.sgml *** doc/src/sgml/failover.sgml 15 Nov 2006 01:09:08 - 1.6 --- doc/src/sgml/failover.sgml 16 Nov 2006 17:12:49 - *** *** 33,38 --- 33,50 /para para + Some solutions deal with synchronization by allowing only one + server to modify the data. Servers that can modify data are + called read/write or master server. Servers with read-only + data are called backup or slave servers. As you will see below, + these terms cover a variety of implementations. Some servers + are masters of some data sets, and slave of others. Some slaves + cannot be accessed until they are changed to master servers, + while other slaves can reply to read-only queries while they are + slaves. + /para + + para Some failover and load balancing solutions are synchronous, meaning that a data-modifying transaction is not considered committed until all servers have committed the transaction. This guarantees that a failover *** *** 118,132 titleData Partitioning/title para !Data partitioning splits tables into data sets. Each set can only be !modified by one server. For example, data can be partitioned by !offices, e.g. London and Paris. While London and Paris servers have all !data records, only London can modify London records, and Paris can only !modify Paris records. /para para !Such partitioning implements both failover and load balancing. Failover is achieved because the data resides on both servers, and this is an ideal way to enable failover if the servers share a slow communication channel. Load balancing is possible because read requests can go to any --- 130,149 titleData Partitioning/title para !Data partitioning splits tables into data sets. Each set can !be modified by only one server. For example, data can be !partitioned by offices, e.g. London and Paris. While London !and Paris servers have all data records, only London can modify !London records, and Paris can only modify Paris records. This !is similar to section xref !linkend=continuously-running-replication-server above, except !that instead of having a read/write server and a read-only server, !each server has a read/write data set and a read-only data !set. /para para !Such partitioning provides both failover and load balancing. Failover is achieved because the data resides on both servers, and this is an ideal way to enable failover if the servers share a slow communication channel. Load balancing is possible because read requests can go to any ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PATCHES] Cast null to int4 upgrading from Version 7.2
My company is currently using version 7.2 and would like to convert to the latest version. Unfortunately, version 7.3 implicitly casts a null text to an int4. For example: Create table employee_table ( employee_id integer employee_name text employee_address text); Select * from employee_table where employee_id = ''; When executing this select statement in version 7.2 the null will be converted to an int zero and not fail. In version 8.2 it fails. We have over 20,000 lines of code and do not want to modify and test all of it. Has anyone come across this problem? (I am not interested in debating the theory of nulls versus zero. I am just trying to avoid unnecessary costs). I am not a DBA, and am looking for explicit instructions to solve this problem. Is it possible to create a CAST after upgrading to version 8.2? My research tells me the following cast was no longer implemented after version 7.2. Will executing the following CAST solve my problem? CREATE CAST (text AS int4) WITH FUNCTION int4(text); If so, can someone give me instructions as to executing this statement? Any help is appreciated.. Dwight
Re: [PATCHES] replication docs: split single vs. multi-master
Bruce Momjian wrote: I am now attaching the additional text I added based on your comments. I have also changed the markup so all the solutions appear on the same web page. I think seeing it all together might give us new ideas for improvement. Good, it's definitely better to have it all on one page. I just thought about the words 'master' and 'slave', which are admittedly quite unfortunate. I remember reading about efforts to remove them from geek-speech. They proposed to introduce better names. At least with the old IDE drives, master- and slave-drives seem to disappear now... Regards Markus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Cast null to int4 upgrading from Version 7.2
Dwight Emmons wrote: [Why did you post this to pgsql-patches of all places? it should properly have gone to pgsql-general, I think] My company is currently using version 7.2 and would like to convert to the latest version. Unfortunately, version 7.3 implicitly casts a null text to an int4. For example: Create table employee_table ( employee_id integer employee_name text employee_address text); Select * from employee_table where employee_id = ‘’; That's not a NULL at all, it's an empty string. You really need to understand the difference between the two. Old editions of postgres did take an empty string literal as a 0 for ints, modern version quite rightly reject it as invalid. use NULL if you mean NULL and 0 if you mean 0. When executing this select statement in version 7.2 the null will be converted to an int zero and not fail. In version 8.2 it fails. We have over 20,000 lines of code and do not want to modify and test all of it. Has anyone come across this problem? (I am not interested in debating the theory of nulls versus zero. I am just trying to avoid unnecessary costs). I am not a DBA, and am looking for explicit instructions to solve this problem. Is it possible to create a CAST after upgrading to version 8.2? My research tells me the following cast was no longer implemented after version 7.2. Will executing the following CAST solve my problem? CREATE CAST (text AS int4) WITH FUNCTION int4(text); If so, can someone give me instructions as to executing this statement? Any help is appreciated…. This has nothing to do with casts, I believe - it has to to with what the input routines accept. I strongly suspect that renovating your code is your best choice, much as that might pain you. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Extended protocol logging
Simon Riggs wrote: On Wed, 2006-11-01 at 10:06 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Tue, 2006-10-31 at 23:51 -0500, Tom Lane wrote: With what logging settings? log_duration has rather different behavior from what it used to do. I think it would be useful to have the log results from a test program in the protocol section, The contents of the postmaster log are surely not part of the FE protocol. Clients can't even see the log without resorting to nonstandard hacks. OK, can we please put the example from -hackers into the docs, somewhere, with particular note of which protocol messages result in which logging output? If people want to know the output, run a program and look at the postmaster logs. If we document it, we have to keep it current, even if we improve it later. I will say I had trouble testing this logging because it requires a C program to use that protocol. Here is the test program I used. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + /* * prepare test program * *Test the C version of libpq, the PostgreSQL frontend library. */ #include stdio.h #include stdlib.h #include libpq-fe.h static void exit_nicely(PGconn *conn) { PQfinish(conn); exit(1); } int main(int argc, char **argv) { const char *conninfo; PGconn *conn; PGresult *res; int nFields; int i, j; const char *val[2]; int types[2]; /* * If the user supplies a parameter on the command line, use it as the * conninfo string; otherwise default to setting dbname=postgres and using * environment variables or defaults for all other connection parameters. */ if (argc 1) conninfo = argv[1]; else conninfo = dbname = postgres; /* Make a connection to the database */ conn = PQconnectdb(conninfo); /* Check to see that the backend connection was successfully made */ if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, Connection to database failed: %s, PQerrorMessage(conn)); exit_nicely(conn); } res = PQexec(conn, SET log_min_duration_statement = 0;); //res = PQexec(conn, SET log_statement = 'all';); if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, BEGIN command failed: %s, PQerrorMessage(conn)); PQclear(res); exit_nicely(conn); } /* * Should PQclear PGresult whenever it is no longer needed to avoid memory * leaks */ PQclear(res); /* * Our test case here involves using a cursor, for which we must be inside * a transaction block. We could do the whole thing with a single * PQexec() of select * from pg_database, but that's too trivial to make * a good example. */ /* * Fetch rows from pg_database, the system catalog of databases */ types[0] = 25; // types[1] = 23; res = PQprepare(conn, sel1, SELECT $1;, 1, types); if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, prepared failed: %s, PQerrorMessage(conn)); PQclear(res); exit_nicely(conn); } PQclear(res); val[0] = a'b; val[1] = 9; res = PQexecPrepared(conn, sel1, 1, val, NULL, NULL, 0); if (PQresultStatus(res) != PGRES_TUPLES_OK) { fprintf(stderr, exec failed: %s, PQerrorMessage(conn)); PQclear(res); exit_nicely(conn); } /* first, print out the attribute names */ nFields = PQnfields(res); for (i = 0; i nFields; i++) printf(%-15s, PQfname(res, i)); printf(\n\n); /* next, print out the rows */ for (i = 0; i PQntuples(res); i++) { for (j = 0; j nFields; j++) printf(%-15s, PQgetvalue(res, i, j)); printf(\n); } PQclear(res); /* close the connection to the database and cleanup */ PQfinish(conn); return 0; } ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PATCHES] Proposed patch for xact-vs-multixact bugs
The attached patch fixes the problem discussed here http://archives.postgresql.org/pgsql-hackers/2006-11/msg00357.php as well as a related problem that I discovered while working on it: the sequence begin; savepoint x; select * from foo for update; release savepoint x; select * from foo for share; leaves us holding only share lock not exclusive lock on the selected tuples. That's because heap_lock_tuple() considered only the exact-XID-equality case when checking to see if we were requesting share lock while already holding exclusive lock. We should treat exclusive lock held under any of the current backend's subtransactions as not to be overridden. In addition, this formulation avoids useless buffer-dirtying and WAL reporting in all cases where the desired lock is already effectively held, whereas the old code would go through the full pushups anyway. I've only tested it against HEAD but it will need to be applied to 8.1 as well. Anyone see any problems? regards, tom lane *** src/backend/access/heap/heapam.c.orig Sun Nov 5 17:42:07 2006 --- src/backend/access/heap/heapam.cThu Nov 16 20:10:37 2006 *** *** 2359,2364 --- 2359,2366 ItemId lp; PageHeader dp; TransactionId xid; + TransactionId xmax; + uint16 old_infomask; uint16 new_infomask; LOCKMODEtuple_lock_type; boolhave_tuple_lock = false; *** *** 2396,2401 --- 2398,2422 LockBuffer(*buffer, BUFFER_LOCK_UNLOCK); /* +* If we wish to acquire share lock, and the tuple is already +* share-locked by a multixact that includes any subtransaction of the +* current top transaction, then we effectively hold the desired lock +* already. We *must* succeed without trying to take the tuple lock, +* else we will deadlock against anyone waiting to acquire exclusive +* lock. We don't need to make any state changes in this case. +*/ + if (mode == LockTupleShared + (infomask HEAP_XMAX_IS_MULTI) + MultiXactIdIsCurrent((MultiXactId) xwait)) + { + Assert(infomask HEAP_XMAX_SHARED_LOCK); + /* Probably can't hold tuple lock here, but may as well check */ + if (have_tuple_lock) + UnlockTuple(relation, tid, tuple_lock_type); + return HeapTupleMayBeUpdated; + } + + /* * Acquire tuple lock to establish our priority for the tuple. * LockTuple will release us when we are next-in-line for the tuple. * We must do this even if we are share-locking. *** *** 2533,2557 } /* * Compute the new xmax and infomask to store into the tuple. Note we do * not modify the tuple just yet, because that would leave it in the wrong * state if multixact.c elogs. */ xid = GetCurrentTransactionId(); ! new_infomask = tuple-t_data-t_infomask; ! ! new_infomask = ~(HEAP_XMAX_COMMITTED | ! HEAP_XMAX_INVALID | ! HEAP_XMAX_IS_MULTI | ! HEAP_IS_LOCKED | ! HEAP_MOVED); if (mode == LockTupleShared) { - TransactionId xmax = HeapTupleHeaderGetXmax(tuple-t_data); - uint16 old_infomask = tuple-t_data-t_infomask; - /* * If this is the first acquisition of a shared lock in the current * transaction, set my per-backend OldestMemberMXactId setting. We can --- 2554,2602 } /* +* We might already hold the desired lock (or stronger), possibly under +* a different subtransaction of the current top transaction. If so, +* there is no need to change state or issue a WAL record. We already +* handled the case where this is true for xmax being a MultiXactId, +* so now check for cases where it is a plain TransactionId. +* +* Note in particular that this covers the case where we already hold +* exclusive lock on the tuple and the caller only wants shared lock. +* It would certainly not do to give up the exclusive lock. +*/ + xmax = HeapTupleHeaderGetXmax(tuple-t_data); + old_infomask = tuple-t_data-t_infomask; + + if (!(old_infomask (HEAP_XMAX_INVALID | + HEAP_XMAX_COMMITTED | + HEAP_XMAX_IS_MULTI)) + (mode == LockTupleShared ?
Re: [PATCHES] Cast null to int4 upgrading from Version 7.2
On Nov 16, 2006, at 3:10 PM, Neil Conway wrote: Yes, this is a common problem for people upgrading from 7.2. I think the long-term fix is to change your queries: comparing an integer with '' is not sensible. That is: SELECT * FROM employee_table WHERE employee_id = 0; is the right way to write that query. As a temporary fix, I suppose you could hack pg_atoi() to treat an empty string as zero (src/backend/utils/adt/numutils.c). As a less invasive alternative, I *think* you could create an SQL function for casting text to int that treated '' as 0, and then replace the built-in CAST with that. You'd also need to make the cast implicit, which could cause other problems. 20k lines of code isn't all that much, though... you'll be much better off fixing it. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Cast null to int4 upgrading from Version 7.2
Jim Nasby wrote: As a less invasive alternative, I *think* you could create an SQL function for casting text to int that treated '' as 0, and then replace the built-in CAST with that. Won't work. You need to replace the data type input function. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate