Re: [HACKERS] PQnotifies() in 7.3 broken?
Bruce Momjian wrote: OK, I have updated the libpq major number in 7.3.X, and updated major and minor in HEAD. Do I need to increment the other interfaces that _use_ libpq, like ecpg? I think so. You'll only need to increment the major number of those other interfaces that are no longer binary-compatible with their immediate predecessors. That is, if the version of libecpg that ships with 7.3.1 is incompatible with the version of libecpg that ships with 7.3, then you should bump its major number (the one that shipped with 7.3 should have had its major number bumped if it was incompatible with the one that shipped with 7.2.3). But its binary interface should be independent of the binary interface of libpg. When you build a shared library, you link it against any shared libraries it depends on. The resulting dependency list is contained within the shared library the same way it's contained in an application. You can run ldd on shared libraries just as you can on applications. It might be helpful to think of the major number as being the same as a protocol description number. You change the protocol description number when you make changes to the protocol that would make a server using the new version of the protocol incompatible with a client using the old version of the protocol. In fact, there's really very little conceptual difference between a protocol and an API: both are a means for two entities to communicate with one another. Change the API and you have to do something to make it clear that the API has changed. Same deal with a protocol. I don't know if what I'm saying here makes much sense to you, but I hope it helps... -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [mail] Re: [HACKERS] Big 7.4 items - Replication
Many thanks for the explanation. Could you explain to me where the order or the writeset for the following scenario; If a tranasction takes 50ms to reach one database from another, for a specific data element (called X), the following timeline occurs at 0ms, T1(X) is written to system A. at 10ms, T2(X) is written to system B. Where T1(X) and T2(X) conflict. My concern is that if the Group Communication Daemon (gcd) is operating on each database, a successful result for T1(X) will returned to the client talking to database A because T2(X) has not reached it, and thus no conflict is known about, and a sucessful result is returned to the client submitting T2(X) to database B because it is not aware of T1(X). This would mean that the two clients beleive bothe T1(X) and T2(X) completed succesfully, yet they can not due to the conflict. Thanks, Al. - Original Message - From: Darren Johnson [EMAIL PROTECTED] To: Al Sutton [EMAIL PROTECTED] Cc: Bruce Momjian [EMAIL PROTECTED]; Jan Wieck [EMAIL PROTECTED]; [EMAIL PROTECTED]; PostgreSQL-development [EMAIL PROTECTED] Sent: Saturday, December 14, 2002 6:48 PM Subject: Re: [mail] Re: [HACKERS] Big 7.4 items - Replication b) The Group Communication blob will consist of a number of processes which need to talk to all of the others to interrogate them for changes which may conflict with the current write that being handled and then issue the transaction response. This is basically the two phase commit solution with phases moved into the group communication process. I can see the possibility of using solution b and having less group communication processes than databases as attempt to simplify things, but this would mean the loss of a number of databases if the machine running the group communication process for the set of databases is lost. The group communication system doesn't just run on one system. For postgres-r using spread there is actually a spread daemon that runs on each database server. It has nothing to do with detecting the conflicts. Its job is to deliver messages in a total order for writesets or simple order for commits, aborts, joins, etc. The detection of conflicts will be done at the database level, by a backend processes. The basic concept is if all databases get the writesets (changes) in the exact same order, apply them in a consistent order, avoid conflicts, then one copy serialization is achieved. (one copy of the database replicated across all databases in the replica) I hope that explains the group communication system's responsibility. Darren ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [MLIST] Re: [mail] Re: [HACKERS] Big 7.4 items - Replication
Another concern I have with multi-master systems is what happens if the network splits in 2 so that 2 master systems are taking commits for 2 separate sets of clients. It seems to me that to re-sync the 2 databases upon the network healing would be a very complex task or impossible task. On Sunday 15 December 2002 04:16 am, Al Sutton wrote: Many thanks for the explanation. Could you explain to me where the order or the writeset for the following scenario; If a tranasction takes 50ms to reach one database from another, for a specific data element (called X), the following timeline occurs at 0ms, T1(X) is written to system A. at 10ms, T2(X) is written to system B. Where T1(X) and T2(X) conflict. My concern is that if the Group Communication Daemon (gcd) is operating on each database, a successful result for T1(X) will returned to the client talking to database A because T2(X) has not reached it, and thus no conflict is known about, and a sucessful result is returned to the client submitting T2(X) to database B because it is not aware of T1(X). This would mean that the two clients beleive bothe T1(X) and T2(X) completed succesfully, yet they can not due to the conflict. Thanks, Al. - Original Message - From: Darren Johnson [EMAIL PROTECTED] To: Al Sutton [EMAIL PROTECTED] Cc: Bruce Momjian [EMAIL PROTECTED]; Jan Wieck [EMAIL PROTECTED]; [EMAIL PROTECTED]; PostgreSQL-development [EMAIL PROTECTED] Sent: Saturday, December 14, 2002 6:48 PM Subject: Re: [mail] Re: [HACKERS] Big 7.4 items - Replication b) The Group Communication blob will consist of a number of processes which need to talk to all of the others to interrogate them for changes which may conflict with the current write that being handled and then issue the transaction response. This is basically the two phase commit solution with phases moved into the group communication process. I can see the possibility of using solution b and having less group communication processes than databases as attempt to simplify things, but this would mean the loss of a number of databases if the machine running the group communication process for the set of databases is lost. The group communication system doesn't just run on one system. For postgres-r using spread there is actually a spread daemon that runs on each database server. It has nothing to do with detecting the conflicts. Its job is to deliver messages in a total order for writesets or simple order for commits, aborts, joins, etc. The detection of conflicts will be done at the database level, by a backend processes. The basic concept is if all databases get the writesets (changes) in the exact same order, apply them in a consistent order, avoid conflicts, then one copy serialization is achieved. (one copy of the database replicated across all databases in the replica) I hope that explains the group communication system's responsibility. Darren ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [MLIST] Re: [mail] Re: [HACKERS] Big 7.4 items - Replication
David, This can be resolved by requiring that for any transaction to succeed the entrypoint database must receive acknowlegements from n/2 + 0.5 (rounded up to the nearest integer) databases where n is the total number in the replicant set. The following cases are shown as an example; Total Number of databases: 2 Number required to accept transaction: 2 Total Number of databases: 3 Number required to accept transaction: 2 Total Number of databases: 4 Number required to accept transaction: 3 Total Number of databases: 5 Number required to accept transaction: 3 Total Number of databases: 6 Number required to accept transaction: 4 Total Number of databases: 7 Number required to accept transaction: 4 Total Number of databases: 8 Number required to accept transaction: 5 This would prevent two replicant sub-sets forming, because it is impossible for both sets to have over 50% of the databases. Applications could be able to detect when a database has dropped out of the replicant set because the database could report a state of Unable to obtain majority consesus. This would allow applications differentiate between a database out of the set where writing to other databases in the set could yield a sucessful result, and Unable to commit due to conflict where trying other databases is pointless. Al Example - Original Message - From: David Walker [EMAIL PROTECTED] To: Al Sutton [EMAIL PROTECTED]; Darren Johnson [EMAIL PROTECTED] Cc: Bruce Momjian [EMAIL PROTECTED]; Jan Wieck [EMAIL PROTECTED]; [EMAIL PROTECTED]; PostgreSQL-development [EMAIL PROTECTED] Sent: Sunday, December 15, 2002 2:29 PM Subject: Re: [MLIST] Re: [mail] Re: [HACKERS] Big 7.4 items - Replication Another concern I have with multi-master systems is what happens if the network splits in 2 so that 2 master systems are taking commits for 2 separate sets of clients. It seems to me that to re-sync the 2 databases upon the network healing would be a very complex task or impossible task. On Sunday 15 December 2002 04:16 am, Al Sutton wrote: Many thanks for the explanation. Could you explain to me where the order or the writeset for the following scenario; If a tranasction takes 50ms to reach one database from another, for a specific data element (called X), the following timeline occurs at 0ms, T1(X) is written to system A. at 10ms, T2(X) is written to system B. Where T1(X) and T2(X) conflict. My concern is that if the Group Communication Daemon (gcd) is operating on each database, a successful result for T1(X) will returned to the client talking to database A because T2(X) has not reached it, and thus no conflict is known about, and a sucessful result is returned to the client submitting T2(X) to database B because it is not aware of T1(X). This would mean that the two clients beleive bothe T1(X) and T2(X) completed succesfully, yet they can not due to the conflict. Thanks, Al. - Original Message - From: Darren Johnson [EMAIL PROTECTED] To: Al Sutton [EMAIL PROTECTED] Cc: Bruce Momjian [EMAIL PROTECTED]; Jan Wieck [EMAIL PROTECTED]; [EMAIL PROTECTED]; PostgreSQL-development [EMAIL PROTECTED] Sent: Saturday, December 14, 2002 6:48 PM Subject: Re: [mail] Re: [HACKERS] Big 7.4 items - Replication b) The Group Communication blob will consist of a number of processes which need to talk to all of the others to interrogate them for changes which may conflict with the current write that being handled and then issue the transaction response. This is basically the two phase commit solution with phases moved into the group communication process. I can see the possibility of using solution b and having less group communication processes than databases as attempt to simplify things, but this would mean the loss of a number of databases if the machine running the group communication process for the set of databases is lost. The group communication system doesn't just run on one system. For postgres-r using spread there is actually a spread daemon that runs on each database server. It has nothing to do with detecting the conflicts. Its job is to deliver messages in a total order for writesets or simple order for commits, aborts, joins, etc. The detection of conflicts will be done at the database level, by a backend processes. The basic concept is if all databases get the writesets (changes) in the exact same order, apply them in a consistent order, avoid conflicts, then one copy serialization is achieved. (one copy of the database replicated across all databases in the replica) I hope that explains the group communication system's responsibility. Darren ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] PQnotifies() in 7.3 broken?
Bruce Momjian writes: Do I need to increment the other interfaces that _use_ libpq, like ecpg? If and only if the libpq API is part of their documented API. For ecpg I think this is not the case, but for libpq++ it would seem to be the case. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Information schema now available
Hannu Krosing writes: Could you also post it somewhere as a plain SQL script for 7.3 ? It depends on SQL features not present in 7.3. But you can download it from the CVS web interface and make the adjustments yourself if you want to use it. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PQnotifies() in 7.3 broken?
Peter Eisentraut [EMAIL PROTECTED] writes: Bruce Momjian writes: Do I need to increment the other interfaces that _use_ libpq, like ecpg? If and only if the libpq API is part of their documented API. For ecpg I think this is not the case, but for libpq++ it would seem to be the case. However, an app linked against libpq++ would also be linked against libpq, and so the incompatibility will be flagged by the linker anyway. I can see no need to bump libpq++'s own number. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [mail] Re: [HACKERS] Big 7.4 items - Replication
Jonathan, How do the group communication daemons on system A and B agree that T2 is after T1?, As I understand it the operation is performed locally before being passed on to the group for replication, when T2 arrives at system B, system B has no knowlege of T1 and so can perform T2 sucessfully. I am guessing that the System B performs T2 locally, sends it to the group communication daemon for ordering, and then receives it back from the group communication order queue after it's position in the order queue has been decided before it is written to the database. This would indicate to me that there is a single central point which decides that T2 is after T1. Is this true? Al. - Original Message - From: Jonathan Stanton [EMAIL PROTECTED] To: Al Sutton [EMAIL PROTECTED] Cc: Darren Johnson [EMAIL PROTECTED]; Bruce Momjian [EMAIL PROTECTED]; Jan Wieck [EMAIL PROTECTED]; [EMAIL PROTECTED]; PostgreSQL-development [EMAIL PROTECTED] Sent: Sunday, December 15, 2002 5:00 PM Subject: Re: [mail] Re: [HACKERS] Big 7.4 items - Replication The total order provided by the group communication daemons guarantees that every member will see the tranactions/writesets in the same order. So both A and B will see that T1 is ordered before T2 BEFORE writing anything back to the client. So for both servers T1 will be completed successfully, and T2 will be aborted because of conflicting writesets. Jonathan On Sun, Dec 15, 2002 at 10:16:22AM -, Al Sutton wrote: Many thanks for the explanation. Could you explain to me where the order or the writeset for the following scenario; If a tranasction takes 50ms to reach one database from another, for a specific data element (called X), the following timeline occurs at 0ms, T1(X) is written to system A. at 10ms, T2(X) is written to system B. Where T1(X) and T2(X) conflict. My concern is that if the Group Communication Daemon (gcd) is operating on each database, a successful result for T1(X) will returned to the client talking to database A because T2(X) has not reached it, and thus no conflict is known about, and a sucessful result is returned to the client submitting T2(X) to database B because it is not aware of T1(X). This would mean that the two clients beleive bothe T1(X) and T2(X) completed succesfully, yet they can not due to the conflict. Thanks, Al. - Original Message - From: Darren Johnson [EMAIL PROTECTED] To: Al Sutton [EMAIL PROTECTED] Cc: Bruce Momjian [EMAIL PROTECTED]; Jan Wieck [EMAIL PROTECTED]; [EMAIL PROTECTED]; PostgreSQL-development [EMAIL PROTECTED] Sent: Saturday, December 14, 2002 6:48 PM Subject: Re: [mail] Re: [HACKERS] Big 7.4 items - Replication b) The Group Communication blob will consist of a number of processes which need to talk to all of the others to interrogate them for changes which may conflict with the current write that being handled and then issue the transaction response. This is basically the two phase commit solution with phases moved into the group communication process. I can see the possibility of using solution b and having less group communication processes than databases as attempt to simplify things, but this would mean the loss of a number of databases if the machine running the group communication process for the set of databases is lost. The group communication system doesn't just run on one system. For postgres-r using spread there is actually a spread daemon that runs on each database server. It has nothing to do with detecting the conflicts. Its job is to deliver messages in a total order for writesets or simple order for commits, aborts, joins, etc. The detection of conflicts will be done at the database level, by a backend processes. The basic concept is if all databases get the writesets (changes) in the exact same order, apply them in a consistent order, avoid conflicts, then one copy serialization is achieved. (one copy of the database replicated across all databases in the replica) I hope that explains the group communication system's responsibility. Darren ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org -- --- Jonathan R. Stanton [EMAIL PROTECTED] Dept. of Computer Science Johns Hopkins University --- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to
Re: [HACKERS] [GENERAL] PostgreSQL Global Development Group
Kevin Brown wrote: Devrim G?ND?Z wrote: I do NOT like hearing about MySQL in this (these) list(s). PostgreSQL is not in the same category with MySQL. MySQL is for *dummies*, not database admins. I do not even call it a database. I have never forgotten my data loss 2,5 years ago; when I used MySQL for just 2 months!!! I think you're on to something here, but it's obscured by the way you said it. There's no question in my mind that PostgreSQL is superior in almost every way to MySQL. For those of us who are technically minded, it boggles the mind that people would choose MySQL over PostgreSQL. Yet they do. And it's important to understand why. Simply saying MySQL has better marketing isn't enough. It's too simple an answer and obscures some issues that should probably be addressed. I think it /is/ a significant factor, the point being that the MySQL company has been quite activist in pressing MySQL as the answer, to the point to which there's a development strategy called LAMP (Linux + Apache + MySQL + (Perl|Python|PHP)). People use MySQL because it's very easy to set up, relatively easy to maintain (when something doesn't go wrong, that is), is very well documented and supported, and is initially adequate for the task they have in mind (that the task may change significantly such that MySQL is no longer adequate is something only those with experience will consider). ... And the consistent marketing pressure that in essence claims: - It's easier to use than any alternative; - It's much faster than any other DBMS; - It's plenty powerful and robust enough. As near as I can tell, /none/ of these things are true outside of very carefully selected application domains. But the claims have been presented enough times that people actually believe them to be true. PostgreSQL has come a long way and, with the exception of a few minor things (the need to VACUUM, for instance. The current version makes the VACUUM requirement almost a non-issue as regards performance and availability, but it really should be something that the database takes care of itself), is equivalent to MySQL in the above things except for documentation and support. I would point to a third thing: Tools to support hands-off administration. My web hosting provider has a set of tools to let me administer various aspects of my site complete with pretty GUI that covers: - Configuring email accounts, including mailing lists, Spam Assassin, and such; - Configuring subdomains; - Managing files/directories, doing backups; - Apache configuration; - Cron jobs; - A couple of shopping cart systems; - A chat room system; - Last, but certainly not least, the ability to manage MySQL databases. There is no canned equivalent for PostgreSQL, which means that ISPs that don't have people with DBMS expertise will be inclined to prefer MySQL. It's a better choice for them. MySQL's documentation is very, very good. My experience with it is that it's possible, and relatively easy, to find information about almost anything you might need to know. PostgreSQL's documentation is good, but not quite as good as MySQL's. It's not quite as complete. For instance, I didn't find any documentation at all in the User's Guide or Administrator's Guide on creating tables (if I missed it, then that might illustrate that the documentation needs to be organized slightly differently). I did find a little in the tutorial (about the amount that you'd want in a tutorial), but to find out more I had to go to the SQL statement reference (in my case I was looking for the means by which one could create a constraint on a column during table creation time). The reason this is important is that the documentation is *the* way people are going to learn the database. If it's too sparse or too disorganized, people who don't have a lot of time to spend searching through the documentation for something may well decide that a different product (such as MySQL) would suit their needs better. The documentation for PostgreSQL improves all the time, largely in response to comments such as this one, and that's a very good thing. My purpose in bringing this up is to show you what PostgreSQL is up against in terms of widespread adoption. That's probably pretty fair. I'm using the word fair advisedly, too. If someone objects, saying that PostgreSQL docs /are/ good, keep in mind that new users are not mandated to be fair about this. If they have trouble finding what they were looking for, they couldn't care less that you think the docs are pretty good: /they/ didn't find what /they/ were looking for, and that's all they care about. If we want to sell PostgreSQL, we should talk about, maybe, Oracle. I have never took care of MySQL said. I just know that I'm running PostgreSQL since 2,5 years and I only stopped it JUST before upgrades of PostgreSQL. It's just *working*; which is unfamiliar
Re: [HACKERS] PQnotifies() in 7.3 broken?
On Sun, Dec 15, 2002 at 11:21:07AM -0500, Tom Lane wrote: However, an app linked against libpq++ would also be linked against libpq, and so the incompatibility will be flagged by the linker anyway. I can see no need to bump libpq++'s own number. Plus, of course, libpq++ being a C++ library will break compatibility at some compiler upgrades--which isn't even necessarily visible to anyone compiling a postgres release. Jeroen ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [mail] Re: [HACKERS] Big 7.4 items - Replication
Jonathan, Many thanks for clarifying the situation some more. With token passing, I have the following concerns; 1) What happends if a server holding the token should die whilst it is in posession of the token. 2) If I have n servers, and the time to pass the token between each server is x milliseconds, I may have to wait for upto m times x milliseconds in order for a transaction to be processed. If a server is limited to a single transaction per posession of the token (in order to ensure no system hogs the token), and the server develops a queue of length y, I will have to wait m times x times y for the transaction to be processed. Both scenarios I beleive would not scale well beyond a small subset of servers with low network latency between them. If we consider the following situation I can illustrate why I'm still in favour of a two phase commit; Imagine, for example, credit card details about the status of an account replicated in real time between databases in London, Moscow, Singapore, Syndey, and New York. If any server can talk to any other server with a guarenteed packet transfer time of 150ms a two phase commit could complete in 600ms as it's worst case (assuming that the two phases consist of request/response pairs, and that each server talks to all the others in parallel). A token passing system may have to wait for the token to pass through every other server before reaching the one that has the transaction comitted to it, which could take about 750ms. If you then expand the network to allow for a primary and disaster recover database at each location the two phase commit still maintains it's 600ms response time, but the token passing system doubles to 1500ms. Allowing disjointed segments to continue executing is also a concern because any split in the replication group could effectively double the accepted card limit for any card holder should they purchase items from various locations around the globe. I can see an idea that the token may be passed to the system with the most transactions in a wait state, but this would cause low volume databases to loose out on response times to higher volume ones, which is again, undesirable. Al. - Original Message - From: Jonathan Stanton [EMAIL PROTECTED] To: Al Sutton [EMAIL PROTECTED] Cc: Darren Johnson [EMAIL PROTECTED]; Bruce Momjian [EMAIL PROTECTED]; Jan Wieck [EMAIL PROTECTED]; [EMAIL PROTECTED]; PostgreSQL-development [EMAIL PROTECTED] Sent: Sunday, December 15, 2002 9:17 PM Subject: Re: [mail] Re: [HACKERS] Big 7.4 items - Replication On Sun, Dec 15, 2002 at 07:42:35PM -, Al Sutton wrote: Jonathan, How do the group communication daemons on system A and B agree that T2 is after T1?, Lets split this into two separate problems: 1) How do the daemons totally order a set of messages (abstract messages) 2) How do database transactions get split into writesets that are sent as messages through the group communication system. As to question 1, the set of daemons (usually one running on each participating server) run a distributed ordering algorithm, as well as distributed algorithms to provide message reliability, fault-detection, and membership services. These are completely distributed algorithms, no central controller node exists, so even if network partitions occur the group communication system keeps running and providing ordering and reliability guarantees to messages. A number of different algorithms exist as to how to provide a total order on messages. Spread currently uses a token algorithm, that involves passing a token between the daemons, and a counter attached to each message, but other algorithms exist and we have implemneted some other ones in our research. You can find lots of details in the papers at www.cnds.jhu.edu/publications/ and www.spread.org. As to question 2, there are several different approaches to how to use such a total order for actual database replication. They all use the gcs total order to establish a single sequence of events that all the databases see. Then each database can act on the events as they are delivered by teh gcs and be guaranteed that no other database will see a different order. In the postgres-R case, the action received from a client is performned partially at the originating postgres server, the writesets are then sent through the gcs to order them and determine conflicts. Once they are delivered back, if no conflicts occured in the meantime, the original transaction is completed and the result returned to the client. If a conflict occured, the original transaction is rolled back and aborted. and the abort is returned to the client. As I understand it the operation is performed locally before being passed on to the group for replication, when T2 arrives at system B, system B has no knowlege of T1 and so can perform T2 sucessfully. I am guessing that the System B performs T2 locally, sends it to the group
Re: [HACKERS] [GENERAL] PostgreSQL Global Development Group
You can't sell into the ISP appliance market until there's something as ubiquitous as PHPMyAdmin for PostgreSQL. And note that the ISP appliance market only cares about this in a very indirect way. They don't actually use the database; their /customers/ do. And their customers are likely to be fairly unsophisticated souls who will use whatever database is given to them. Hey! What about phpPgAdmin? We're actually working on a next generation version atm which is a total rewrite that: 1. modern php 2. register_globals off, full error checking 3. themable 4. Easily supports all versions 5. etc. However, even with repeated calls for developers, it's just me and Rob Treat! phpPgAdmin does not work with 7.3 so this in an increasingly important project. Anyone wanna help? :) http://phppgdamin.sourceforge.net/ Maybe we should move to gborg? Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] PageGetMaxOffsetNumber
Hello hackers, I wonder what the PageGetMaxOffsetNumber macro is supposed to do (in a btree index page)? The scenario is the following: I need to get the pointer to a btree page (leaf or internal) in its parent page, to call _bt_itemdel on it. The only thing I can think of is to use its parent link and then iterate in every OffsetNumber to see which item points down to the page I'm looking for. I think the gotcha here is that the parent page can split and I may have to walk right in order to find the new parent (Is this right? Can the parent go someplace else?) I iterate over the elements of the parent page in a for loop, and the upper bound is rarely reached because the item is found. However sometimes the item isn't found, and PageGetItem fails its assertion because the item isn't used (LP_USED). I have found that PageGetMaxOffsetNumber (the upper bound) returns a consistent value that's far too high (4294967291, 0xFFFB) and therefore the for loop eventually falls out of bounds. The btree freelist patch is almost ready (which means it works in the trivial cases I've tested and there are some corner cases I haven't even covered), however I am stuck on this. Can anyone give me a light? Maybe there's another way to get at the pointer to a page in its parent? -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Crear es tan dificil como ser libre (Elsa Triolet) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PageGetMaxOffsetNumber
On Sun, Dec 15, 2002 at 11:49:57PM -0300, Alvaro Herrera wrote: I iterate over the elements of the parent page in a for loop, and the upper bound is rarely reached because the item is found. However sometimes the item isn't found, and PageGetItem fails its assertion because the item isn't used (LP_USED). I have found that PageGetMaxOffsetNumber (the upper bound) returns a consistent value that's far too high (4294967291, 0xFFFB) and therefore the for loop eventually falls out of bounds. FWIW, the code that is supposed to do this (and appears to work fine on most cases) is the following. buf is the page I'm going to free and pblkno is the BlockNumber of its parent as seen in btpo_parent. /* * Delete the pointer to a child page. If the parent page is empty after * the deletion, delete the pointer from its parent too. */ static void _bt_deletefromparent(Relation rel, BlockNumber pblkno, Buffer buf) { Buffer pbuf; OffsetNumberoffnum; Pageppage; BTPageOpaquepop; BlockNumber blkno = BufferGetBlockNumber(buf), max; pbuf = _bt_getbuf(rel, pblkno, BT_WRITE); Assert(!BufferIsInvalid(pbuf)); ppage = BufferGetPage(pbuf); pop = (BTPageOpaque) PageGetSpecialPointer(ppage); /* * Repeat until the correct parent page is found. Splits may * cause the parent page to move right. */ for (;;) { BlockNumber next; /* Make sure no one else tries to look at the page. */ LockBuffer(pbuf, BUFFER_LOCK_UNLOCK); LockBufferForCleanup(pbuf); max = PageGetMaxOffsetNumber(pop); /* * Look every offset of the page for the item pointing to the * dead page. */ for (offnum = FirstOffsetNumber; offnum = max; offnum++) { BTItem item; ItemPointer iptr; item = (BTItem) PageGetItem(ppage, PageGetItemId(ppage, offnum)); iptr = (item-bti_itup.t_tid); if (ItemPointerGetBlockNumber(iptr) == blkno) { /* Ok, we found the page. Now delete the pointer. */ ItemPointer iptrd = palloc(SizeOfIptrData); ItemPointerSet(iptrd, pblkno, offnum); _bt_itemdel(rel, pbuf, iptrd); _bt_wrtnorelbuf(rel, pbuf); LockBuffer(pbuf, BUFFER_LOCK_UNLOCK); /* * If the parent page is empty after this deletion, * mark it dead and free it too. */ if (_bt_pageisempty(ppage)) { pop-btpo_flags |= BTP_DEAD; _bt_processdead(rel, pbuf); } ReleaseBuffer(pbuf); return; } } /* * If we just finished scanning the rightmost page of the upper level, * there's some wrong. */ if (P_RIGHTMOST(pop)) elog(ERROR, Unable to find parent page!); /* Oops, the parent was split. Check its right sibling. */ next = pop-btpo_next; _bt_relbuf(rel, pbuf); pbuf = _bt_getbuf(rel, next, BT_WRITE); ppage = BufferGetPage(pbuf); pop = (BTPageOpaque) PageGetSpecialPointer(ppage); } } -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Aprende a avergonzarte mas ante ti que ante los demas (Democrito) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] about table function
hello, Joe Conway. it seems the crosstab function you write in contrib/tablefunc can't work. it saids that A column definition list is required for functions returning RECORD. How can i make it work? thank you very much best regard, Jinqiang Han ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] about table function
Jinqiang Han wrote: it seems the crosstab function you write in contrib/tablefunc can't work. it saids that A column definition list is required for functions returning RECORD. How can i make it work? From contrib/tablefunc/README.tablefunc: Outputs Returns setof record, which must defined with a column definition in the FROM clause of the SELECT statement, e.g.: SELECT * FROM crosstab(sql, 2) AS ct(row_name text, category_1 text, category_2 text); the example crosstab function produces a set something like: == values columns == row_name category_1 category_2 -++ row1val1 val2 row2val5 val6 You are apparently forgetting the AS ct(row_name text, category_1 text, category_2 text) part. hope this helps, Joe ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PQnotifies() in 7.3 broken?
But it's something they should of already had to do. We're just paying late for old sins. ;) Greg On Thu, 2002-12-12 at 23:34, Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: OK, so what do we do with 7.3.1. Increment major or minor? Major. I thought you did it already? I did only minor, which I knew was safe. Do folks realize this will require recompile of applications by 7.3 users moving to 7.3.1? That seems very drastic, and there have been very few problem reports about the NOTIFY change. -- Greg Copeland [EMAIL PROTECTED] Copeland Computer Consulting ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [mail] Re: [HACKERS] Big 7.4 items - Replication
Darren Johnson wrote: The group communication system doesn't just run on one system. For postgres-r using spread The reason why group communication software is used is simply because this software is designed with two goals in mind: 1) optimize bandwidth usage 2) make many-to-many communication easy Number one is done by utilizing things like multicasting where available. Number two is done by using global scoped queues. I add this only to avoid reading that pushing some PITR log snippets via FTP or worse over a network would do the same. It did not in the past, it does not do right now and it will not do in the future. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [GENERAL] PostgreSQL Global Development Group
[EMAIL PROTECTED] wrote: Kevin Brown wrote: Simply saying MySQL has better marketing isn't enough. It's too simple an answer and obscures some issues that should probably be addressed. I think it /is/ a significant factor, the point being that the MySQL company has been quite activist in pressing MySQL as the answer, to the point to which there's a development strategy called LAMP (Linux + Apache + MySQL + (Perl|Python|PHP)). Oh, I'll certainly not dispute that marketing has had a significant effect, but I don't think it's the only reason for MySQL's success. History has a lot to do with it, because it's through history that momentum gets built up, as it has with MySQL. People use MySQL because it's very easy to set up, relatively easy to maintain (when something doesn't go wrong, that is), is very well documented and supported, and is initially adequate for the task they have in mind (that the task may change significantly such that MySQL is no longer adequate is something only those with experience will consider). ... And the consistent marketing pressure that in essence claims: - It's easier to use than any alternative; - It's much faster than any other DBMS; - It's plenty powerful and robust enough. As near as I can tell, /none/ of these things are true outside of very carefully selected application domains. But the claims have been presented enough times that people actually believe them to be true. I agree with you -- now. But the situation as it is now has not always been. Consider where PostgreSQL was 4 years ago. I believe it was at version 6 at that time, if I remember correctly. And as I recall, many people had very significant issues with it in the key areas of performance and reliability. Now, I didn't experience these things firsthand because I wasn't using it at the time, but it is the general impression I got when reading the accounts of people who *were* using it. MySQL at the time wasn't necessarily any more reliable, but it had one thing going for it that PostgreSQL didn't: myisamchk. Even if the database crashed, you stood a very good chance of being able to recover your data without having to restore from backups. PostgreSQL didn't have this at all: either you had to be a guru with the PostgreSQL database format or you had to restore from backups. That meant that *in practice* MySQL was easier to maintain, even it crashed more often as PostgreSQL, because the amount of administrative effort to deal with a MySQL crash was so much less. PostgreSQL has come a long way and, with the exception of a few minor things (the need to VACUUM, for instance. The current version makes the VACUUM requirement almost a non-issue as regards performance and availability, but it really should be something that the database takes care of itself), is equivalent to MySQL in the above things except for documentation and support. I would point to a third thing: Tools to support hands-off administration. My web hosting provider has a set of tools to let me administer various aspects of my site complete with pretty GUI that covers: - Configuring email accounts, including mailing lists, Spam Assassin, and such; - Configuring subdomains; - Managing files/directories, doing backups; - Apache configuration; - Cron jobs; - A couple of shopping cart systems; - A chat room system; - Last, but certainly not least, the ability to manage MySQL databases. There is no canned equivalent for PostgreSQL, which means that ISPs that don't have people with DBMS expertise will be inclined to prefer MySQL. It's a better choice for them. This is true, but the only way to combat that is to get PostgreSQL more widely deployed. Network effects such as that are common in the computing world, so it doesn't come as much surprise that the most popular database engine in the webhosting world is the best supported one for that role. It's only because of the relative popularity of MySQL that it has so much support. The only way to grow PostgreSQL's popularity is to get it deployed in situations where the tools available for it are sufficient. But you're mistaken if you believe that MySQL isn't competition for PostgreSQL. It is, because it serves the same purpose: a means of storing information in an easily retrievable way. Indeed. People with modest data storage requirements that came in with /no/ comprehension of what a relational database is may find the limited functionality of MySQL perfectly reasonable for their purposes. This is true, but the biggest problem is that the requirements of a project often balloon over time, and the demands on the database backend will also tend to increase. Because MySQL is rather limited in its functionality, it doesn't take much until you'll be forced to use a different database backend. This is why I view PostgreSQL as a much wiser choice in almost all cases where you