Re: [HACKERS] leaky views, yet again
On 07.10.2010 06:39, Robert Haas wrote: On Tue, Oct 5, 2010 at 3:42 PM, Tom Lanet...@sss.pgh.pa.us wrote: Right, *column* filtering seems easy and entirely secure. The angst here is about row filtering. Can we have a view in which users can see the values of a column for some rows, with perfect security that they can't identify values for the hidden rows? The stronger form is that they shouldn't even be able to tell that hidden rows exist, which is something your view doesn't try to do; but there are at least some applications where that would be desirable. I took a crack at documenting the current behavior; see attached. Looks good. It gives the impression that you need to be able to a create custom function to exploit, though. It would be good to mention that internal functions can be used too, revoking access to CREATE FUNCTION does not make you safe. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] todo point: plpgsql - scrollable cursors are supported
2010/10/7 Robert Haas robertmh...@gmail.com: On Mon, Oct 4, 2010 at 2:52 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I am thinking so you can remove a scrollable cursor support from ToDo for plpgsql. Scrollable cursors are supported and supported syntax are same as core SQL language. I agree, removed. I also removed WITH HOLD cursors, which we seem to have as well. I think so we doesn't support WITH HOLD cursor syntax yet. Maybe we have similar functionality. Don't know. Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perf regression in 2.6.32 (Ubuntu 10.04 LTS)
Robert Haas wrote: Greg, have you run into any other evidence suggesting a problem with 2.6.32? I haven't actually checked myself yet. Right now the only distribution shipping 2.6.32 usefully is Ubuntu 10.04, which I can't recommend anyone use on a server because their release schedules are way too aggressive to ever deliver stable versions anymore. So until either RHEL6 or Debian Squeeze ships, very later this year or early next, the performance of 2.6.32 is irrelevant to me. And by then I'm hoping that the early adopters have squashed more of the obvious bugs here. 2.6.32 is 11 months old at this point, which makes it still a bleeding edge kernel in my book. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
Any feedbacks from TGL and Heikki, then? 2010/7/29 Joshua D. Drake j...@commandprompt.com: On Thu, 2010-07-29 at 19:52 +0200, Vincenzo Romano wrote: 2010/7/29 Joshua D. Drake j...@commandprompt.com: On Thu, 2010-07-29 at 19:34 +0200, Vincenzo Romano wrote: I expect that a more complex schema will imply higher workloads on the query planner. What I don't know is how the increase in the workload will happen: linearly, sublinearly, polynomially or what? Do you think I should ask somewhere else? Any hint? The two people that would likely know the best are on vacation, TGL and Heikki. You may have to wait a bit. Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- cel +393398083886 fix +390823454163 fax +3902700506964 gtalk. vincenzo.rom...@notorand.it skype. notorand.it -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
On 07.10.2010 10:09, Vincenzo Romano wrote: Any feedbacks from TGL and Heikki, then? I don't have anything to add to what others said already. Your best advice is to test it yourself. I would expect the plan time to be linear relative to the number of partial indexes or child tables involved, except that constraint exclusion of CHECK constraints on the partitions is exponential. But I also wouldn't be surprised if there's some other non-linear aspect there that shows its head with thousands of partitions. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On Wed, 2010-10-06 at 10:57 -0700, Josh Berkus wrote: I also strongly believe that we should get single-standby functionality committed and tested *first*, before working further on multi-standby. Yes, lets get k = 1 first. With k = 1 the number of standbys is not limited, so we can still have very robust and highly available architectures. So we mean first-acknowledgement-releases-waiters. (1) Consistency: this is another DBA-false-confidence issue. DBAs who implement (1) are liable to do so thinking that they are not only guaranteeing the consistency of every standby with the master, but the consistency of every standby with every other standby -- a kind of dummy multi-master. They are not, so it will take multiple reminders and workarounds in the docs to explain this. And we'll get complaints anyway. This puts the matter very clearly. Setting k = N is not as good an idea as it sounds when first described. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On Wed, 2010-10-06 at 10:57 -0700, Josh Berkus wrote: (2), (3) Degradation: (Jeff) these two cases make sense only if we give DBAs the tools they need to monitor which standbys are falling behind, and to drop and replace those standbys. Otherwise we risk giving DBAs false confidence that they have better-than-1-standby reliability when actually they don't. Current tools are not really adequate for this. Current tools work just fine for identifying if a server is falling behind. This improved in 9.0 to give fine-grained information. Nothing more is needed here within the server. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
On Thu, 2010-10-07 at 10:28 +0300, Heikki Linnakangas wrote: constraint exclusion of CHECK constraints on the partitions is exponential Constraint exclusion is linear with respect to number of partitions. Why do you say exponential? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] host name support in pg_hba.conf
On tis, 2010-10-05 at 22:28 -0400, Joachim Wieland wrote: This is exactly what I am seeing here. However contrary to your case the patch makes it even worse on my side. With the patch compiled in and a pg_hba.conf entry of localhost, I cannot connect anymore to -h localhost, I get no pg_hba.conf entry for host ::1. This is mostly standard Ubuntu setup. Could you post your /etc/hosts file? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
On 07.10.2010 10:41, Simon Riggs wrote: On Thu, 2010-10-07 at 10:28 +0300, Heikki Linnakangas wrote: constraint exclusion of CHECK constraints on the partitions is exponential Constraint exclusion is linear with respect to number of partitions. Why do you say exponential? For some reason I thought the planner needs to check the constraints of the partitions against each other, but you're right, clearly that's not the case. Linear it is. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On 10/06/2010 10:01 PM, Simon Riggs wrote: The code to implement your desired option is more complex and really should come later. I'm sorry, but I think of that exactly the opposite way. The timeout for automatic continuation after waiting for a standby is the addition. The wait state of the master is there anyway, whether or not it's bound by a timeout. The timeout option should thus come later. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
Markus Wanner mar...@bluegap.ch writes: I'm just saying that this should be an option, not the only choice. I'm sorry, I just don't see the use case for a mode that drops guarantees when they are most needed. People who don't need those guarantees should definitely go for async replication instead. We're still talking about freezing the master and all the applications when the first standby still has to do a base backup and catch-up to where the master currently is, right? What does a synchronous replication mode that falls back to async upon failure give you, except for a severe degradation in performance during normal operation? Why not use async right away in such a case? It's all about the standard case you're building, sync rep, and how to manage errors. In most cases I want flexibility. Alert says standby is down, you lost your durability requirements, so now I'm building a new standby. Does it mean my applications are all off and the master refusing to work? I sure hope I can choose about that, if possible per application. Next step, the old standby has been able to boot again, thanks to the sysadmins who repaired it, so it's online again, and my replacement machine is doing a base-backup. Are all the applications still unavailable? I sure hope I have a word in this decision. so opening a superuser connection to act on the currently waiting transaction is still possible (pass/fail, but fail is what at this point? shutdown to wait some more offline?). Not sure I'm following here. The admin will be busy re-establishing (connections to) standbies, killing transactions on the master doesn't help anything - whether or not the master waits forever. The idea here would be to be able to manually ACK a transaction that's waiting forever, because you know it won't have an answer and you'd prefer the application to just continue. But I see that's not a valid use case for you. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep at Oct 5
On Wed, Oct 6, 2010 at 4:06 PM, Simon Riggs si...@2ndquadrant.com wrote: The problem is how much WAL is stored on (any) node. Currently that is wal_keep_segments, which doesn't work very well, but I've seen no better ideas that cover all important cases. What about allowing the master to read and send WAL from the archive? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On 07.10.2010 12:52, Dimitri Fontaine wrote: Markus Wannermar...@bluegap.ch writes: I'm just saying that this should be an option, not the only choice. I'm sorry, I just don't see the use case for a mode that drops guarantees when they are most needed. People who don't need those guarantees should definitely go for async replication instead. We're still talking about freezing the master and all the applications when the first standby still has to do a base backup and catch-up to where the master currently is, right? Either that, or you configure your system for asynchronous replication first, and flip the switch to synchronous only after the standby has caught up. Setting up the first standby happens only once when you initially set up the system, or if you're recovering from a catastrophic loss of the standby. What does a synchronous replication mode that falls back to async upon failure give you, except for a severe degradation in performance during normal operation? Why not use async right away in such a case? It's all about the standard case you're building, sync rep, and how to manage errors. In most cases I want flexibility. Alert says standby is down, you lost your durability requirements, so now I'm building a new standby. Does it mean my applications are all off and the master refusing to work? Yes. That's why you want to have at least two standbys if you care about availability. Or if durability isn't that important to you after all, use asynchronous replication. Of course, if in the heat of the moment the admin is willing to forge ahead without the standby, he can temporarily change the configuration in the master. If you want the standby to be rebuilt automatically, you can even incorporate that configuration change in the scripts too. The important point is that you or your scripts are in control, and you know at all times whether you can trust the standby or not. If the master makes such decisions automatically, you don't know if the standby is trustworthy (ie. guaranteed up-to-date) or not. so opening a superuser connection to act on the currently waiting transaction is still possible (pass/fail, but fail is what at this point? shutdown to wait some more offline?). Not sure I'm following here. The admin will be busy re-establishing (connections to) standbies, killing transactions on the master doesn't help anything - whether or not the master waits forever. The idea here would be to be able to manually ACK a transaction that's waiting forever, because you know it won't have an answer and you'd prefer the application to just continue. But I see that's not a valid use case for you. I don't see anything wrong with having tools for admins to deal with the unexpected. I'm not sure overriding individual transactions is very useful though, more likely you'll want to take the whole server offline, or you want to change the config to allow all transactions to continue without the synchronous standby. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Either that, or you configure your system for asynchronous replication first, and flip the switch to synchronous only after the standby has caught up. Setting up the first standby happens only once when you initially set up the system, or if you're recovering from a catastrophic loss of the standby. Or if the standby is lagging and the master wal_keep_segments is not sized big enough. Is that a catastrophic loss of the standby too? It's all about the standard case you're building, sync rep, and how to manage errors. In most cases I want flexibility. Alert says standby is down, you lost your durability requirements, so now I'm building a new standby. Does it mean my applications are all off and the master refusing to work? Yes. That's why you want to have at least two standbys if you care about availability. Or if durability isn't that important to you after all, use asynchronous replication. Agreed, that's a nice simple use case. Another one is to say that I want sync rep when the standby is available, but I don't have the budget for more. So I prefer a good alerting system and low-budget-no-guarantee when the standby is down, that's my risk evaluation. Of course, if in the heat of the moment the admin is willing to forge ahead without the standby, he can temporarily change the configuration in the master. If you want the standby to be rebuilt automatically, you can even incorporate that configuration change in the scripts too. The important point is that you or your scripts are in control, and you know at all times whether you can trust the standby or not. If the master makes such decisions automatically, you don't know if the standby is trustworthy (ie. guaranteed up-to-date) or not. My proposal is that the master has the information to make the decision, and the behavior is something you setup. Default to security, so wait forever and block the applications, but could be set to ignore standby that have not at least reached this state. I don't see that you can make everybody happy without a knob here, and I don't see how we can deliver one without a clear state diagram of the standby possible current states and transitions. The other alternative is to just don't care and accept the timeout as being an option with the quorum, so that you just don't wait for the quorum if so you want. It's much more dynamic and dangerous, but with a good alerting system it'll be very popular I guess. I don't see anything wrong with having tools for admins to deal with the unexpected. I'm not sure overriding individual transactions is very useful though, more likely you'll want to take the whole server offline, or you want to change the config to allow all transactions to continue without the synchronous standby. The question then is, should the new configuration alter running transactions? My implicit was that I don't think so, and then I need another facility, such as SELECT pg_cancel_quorum_wait(procpid) FROM pg_stat_activity WHERE waiting_quorum; Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On Thu, 2010-10-07 at 11:46 +0200, Markus Wanner wrote: On 10/06/2010 10:01 PM, Simon Riggs wrote: The code to implement your desired option is more complex and really should come later. I'm sorry, but I think of that exactly the opposite way. I see why you say that. Dimitri's suggestion is an enhancement on the basic feature, just as Heikki's is. My reply was directed at Heikki, but should also apply to Dimitri's idea also. The timeout for automatic continuation after waiting for a standby is the addition. The wait state of the master is there anyway, whether or not it's bound by a timeout. The timeout option should thus come later. Adding timeout is very little code. We can take that out of the patch if that's an objection. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: SQL/MED(FDW) DDL
On Tue, 05 Oct 2010 14:59:27 -0400 Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I wonder if we should focus on first efforts on really simple cases like CSV files (as you mentioned) and perhaps something like memcached, which has different properties than a CSV file, but extremely simple ones. I think it's inevitable that the API is going to get more complicated from release to release and probably not in backward-compatible ways; I think it's too early to be worried about that. +1 ... we can *not* assume that we know enough to get the API right the first time. It'll have to be stated up front that it's subject to change and you probably shouldn't be trying to develop FDWs separately from the main project for awhile yet. I think that looking only at CSV might be too simple, though. It'd be good to be at least trying to build a remote-PG adapter, even with the knowledge that it's all throwaway code. Thanks for comments, and sorry for delayed reply. I propose revised version of FDW API which supports both of CSV and PG. It would be a minimal API to support SQL/MED, so some features such as optimizing query or using remote index are not considered much. == This FDW API is designed to support scanning a foreign table. As SQL/MED standard says, updating external data is not supported. The instance of a foreign table may be a CSV file or a table on an external PostgreSQL server, or anything which could generate PG-style tuples. Scanning of a foreign table will be done in some steps below. Note: FdwRoutine-XXX() means FDW API function. 1. Parser parses a query string and generate a query tree. In this step, a foreign table is not distinguished from a local table. 2. Optimizer generates a Path pathnode (pathtype = T_ForeignPath) for each foreign table in the query. Because foreign tables don't have TID nor index (at least now), we don't add new Path type. In this step, optimizer calls FdwRoutine-GetStatistics() to get estimated costs of the foreign scan. FDW can calculate costs by itself with some statistics, or get costs from remote-side, or leave the costs as is. 3. Planner generates a ForeignScan plannode for each foreign Path. ForeignScan could be typedef of Scan because it has no other member, but it have been defined as a independent structure to make it be able to add other members in the future. 4. Executor executes each ForeignScan node in some steps. Through the Step-4, FDW can keep their own information about the foreign scan in the ForeignScan-FdwReply. 4-a. To initialize a ForeignScan plannodes, ExecInitForeignScan() is called for each ForeignScan node. In ExecInitForeignScan(), ForeignScanState will be created from: * ScanState which hold same information as SeqScan. * FdwRoutine pointer as cache (for cache). * some catalog information about the foreign table (for cache). * connection established with FdwRoutine-ConnectServer(). FDW can use GetFSConnectionByName(name) to get pooled connection by name in FdwRoutine-ConnectServer(). Then, ExecInitForeignScan() calls FdwRouteine-Open() to tell FDW that it's time to start the query. FDW can do some initialization if necessary. 4-b. To retrieve a tuple from the foreign table, ExecForeignScan() calls FdwRoutine-Iterate(). FDW should set next tuple into ss_ScanTupleSlot of ScanState or empty the slot to indicate EOF. Projection and qual evaluation will be done in ExecScan() later, so ExecForeignScan() should return all columns/tuples in the external data store. 4-c. To reset the scan and rewind cursor to the head of the foreign table, ExecForeignReScan() calls FdwRoutine-ReOpen(). This occurs when a ForeignScan node is a inner node of a nested loop join. FDW is required to return the first tuple again at next FdwRoutine-Iterate() call. 4-d. At the end of execution of ForeignScan, ExecEndForeignScan() is called. ExecEndForeignScan() calls FdwRoutine-Close() to tell FDW that no more Iterate will called. FDW can do some finalization if necessary. 5. Connections which have established via FdwRoutine-ConnectServer() are pooled in the backend for future query which accesses same foreign server. Pooling mechanism is implemented in core module, not in the each FDW. When a user executes DISCARD ALL command, or backend dies, FdwRoutine-FreeFSConnection() is called for each foreign connection to discard it. To achieve features above, I propose following FDW API: /* * Almost same as SeqScan, but defined as a structure to allow add * members in the future. */ typedef struct ForeignScan { Scanscan; } ForeignScan; /* * Handle to access FDW-depend data. * Each FDW can use ForeignScanState-reply with casting between FdwReply * and actual type. */ typedef FdwReply FdwReply; /* * ForeignScanState node is used to store scan status. */ typedef struct ForeignScanState { ScanState ss; /* its first field is NodeTag
Re: [HACKERS] todo point: plpgsql - scrollable cursors are supported
On Thu, Oct 7, 2010 at 2:38 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/10/7 Robert Haas robertmh...@gmail.com: On Mon, Oct 4, 2010 at 2:52 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I am thinking so you can remove a scrollable cursor support from ToDo for plpgsql. Scrollable cursors are supported and supported syntax are same as core SQL language. I agree, removed. I also removed WITH HOLD cursors, which we seem to have as well. I think so we doesn't support WITH HOLD cursor syntax yet. Maybe we have similar functionality. Don't know. It's in the documentation... http://www.postgresql.org/docs/current/static/sql-declare.html -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] todo point: plpgsql - scrollable cursors are supported
2010/10/7 Robert Haas robertmh...@gmail.com: On Thu, Oct 7, 2010 at 2:38 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/10/7 Robert Haas robertmh...@gmail.com: On Mon, Oct 4, 2010 at 2:52 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I am thinking so you can remove a scrollable cursor support from ToDo for plpgsql. Scrollable cursors are supported and supported syntax are same as core SQL language. I agree, removed. I also removed WITH HOLD cursors, which we seem to have as well. I think so we doesn't support WITH HOLD cursor syntax yet. Maybe we have similar functionality. Don't know. It's in the documentation... http://www.postgresql.org/docs/current/static/sql-declare.html can be - but it isn't supported by plpgsql. This is similar to scrollable cursors - SQL supported it some years, but plpgsql allowed it only two years. The minimal necessary change is relative simple plpgsql's parser update - but I don't know if there isn't other issues. see http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/pl/plpgsql/src/gram.y.diff?r1=1.99r2=1.100 Regards Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep at Oct 5
On 10-10-07 05:52 AM, Fujii Masao wrote: On Wed, Oct 6, 2010 at 4:06 PM, Simon Riggssi...@2ndquadrant.com wrote: The problem is how much WAL is stored on (any) node. Currently that is wal_keep_segments, which doesn't work very well, but I've seen no better ideas that cover all important cases. What about allowing the master to read and send WAL from the archive? Regards, Then you have to deal with telling the archive how long it needs to keep WAL segments because the master might ask for them back. If the archive is remote from the master then you have some extra network copying going on. It would be better to let the slave being reconfigured to read the missing WAL from the archive. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On 10/07/2010 01:08 PM, Simon Riggs wrote: Adding timeout is very little code. We can take that out of the patch if that's an objection. Okay. If you take it out, we are at the wait-forever option, right? If not, I definitely don't understand how you envision things to happen. I've been asking [1] about that distinction before, but didn't get a direct answer. Regards Markus Wanner [1]: Re: Configuring synchronous replication, Markus Wanner: http://archives.postgresql.org/message-id/4c9c5887.4040...@bluegap.ch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] todo point: plpgsql - scrollable cursors are supported
On Thu, Oct 7, 2010 at 7:34 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/10/7 Robert Haas robertmh...@gmail.com: On Thu, Oct 7, 2010 at 2:38 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/10/7 Robert Haas robertmh...@gmail.com: On Mon, Oct 4, 2010 at 2:52 AM, Pavel Stehule pavel.steh...@gmail.com wrote: I am thinking so you can remove a scrollable cursor support from ToDo for plpgsql. Scrollable cursors are supported and supported syntax are same as core SQL language. I agree, removed. I also removed WITH HOLD cursors, which we seem to have as well. I think so we doesn't support WITH HOLD cursor syntax yet. Maybe we have similar functionality. Don't know. It's in the documentation... http://www.postgresql.org/docs/current/static/sql-declare.html can be - but it isn't supported by plpgsql. This is similar to scrollable cursors - SQL supported it some years, but plpgsql allowed it only two years. The minimal necessary change is relative simple plpgsql's parser update - but I don't know if there isn't other issues. see http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/pl/plpgsql/src/gram.y.diff?r1=1.99r2=1.100 I think the issue there may be that the execution of the function is wrapped in a transaction anyway, so it's not really clear what WITH HOLD would do for you. Now, if we could decouple those things... THAT would be exciting, for a lot more reasons than just this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On Thu, Oct 7, 2010 at 3:30 AM, Simon Riggs si...@2ndquadrant.com wrote: Yes, lets get k = 1 first. With k = 1 the number of standbys is not limited, so we can still have very robust and highly available architectures. So we mean first-acknowledgement-releases-waiters. +1. I like the design Greg Smith proposed yesterday (though there are details to be worked out). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leaky views, yet again
On Thu, Oct 7, 2010 at 2:02 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 07.10.2010 06:39, Robert Haas wrote: On Tue, Oct 5, 2010 at 3:42 PM, Tom Lanet...@sss.pgh.pa.us wrote: Right, *column* filtering seems easy and entirely secure. The angst here is about row filtering. Can we have a view in which users can see the values of a column for some rows, with perfect security that they can't identify values for the hidden rows? The stronger form is that they shouldn't even be able to tell that hidden rows exist, which is something your view doesn't try to do; but there are at least some applications where that would be desirable. I took a crack at documenting the current behavior; see attached. Looks good. It gives the impression that you need to be able to a create custom function to exploit, though. It would be good to mention that internal functions can be used too, revoking access to CREATE FUNCTION does not make you safe. OK, second try attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company document-leaky-views-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
2010/10/7 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: On 07.10.2010 10:41, Simon Riggs wrote: On Thu, 2010-10-07 at 10:28 +0300, Heikki Linnakangas wrote: constraint exclusion of CHECK constraints on the partitions is exponential Constraint exclusion is linear with respect to number of partitions. Why do you say exponential? For some reason I thought the planner needs to check the constraints of the partitions against each other, but you're right, clearly that's not the case. Linear it is. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Making these things sub-linear (whether not O(log n) or even O(1) ), provided that there's way to, would make this RDBMS more appealing to enterprises. I mean also partial indexes (as an alternative to table partitioning). Being able to effectively cope with a dozen child tables or so it's more like an amateur feature. If you really need partitioning (or just hierarchical stuff) I think you'll need for quite more than a dozen items. If you partition by just weeks, you'll need 50+ a year. Is there any precise direction to where look into the code for it? Is there a way to put this into a wish list? -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
Salut Dimitri, On 10/07/2010 12:32 PM, Dimitri Fontaine wrote: Another one is to say that I want sync rep when the standby is available, but I don't have the budget for more. So I prefer a good alerting system and low-budget-no-guarantee when the standby is down, that's my risk evaluation. I think that's a pretty special case, because the good alerting system is at least as expensive as another server that just persistently stores and ACKs incoming WAL. Why does one ever want the guarantee that sync replication gives to only hold true up to one failure, if a better guarantee doesn't cost anything extra? (Note that a good alerting system is impossible to achieve with only two servers. You need a third device anyway). Or put another way: a good alerting system is one that understands Postgres to some extent. It protects you from data loss in *every* case. If you attach at least two database servers to it, you get availability as long as any one of the two is up and running. No matter what happened before, even a full cluster power outage is guaranteed to recover from automatically without any data loss. [ Okay, the standby mode that only stores and ACKs WAL without having a full database behind still needs to be written. However, pg_streamrecv certainly goes that direction already, see [1]. ] Sync replication between really just two servers is asking for trouble and certainly not worth the savings in hardware cost. Better invest in a good UPS and redundant power supplies for a single server. The question then is, should the new configuration alter running transactions? It should definitely affect all currently running and waiting transactions. For anything beyond three servers, where quorum_commit could be bigger than one, it absolutely makes sense to be able to just lower the requirements temporarily, instead of having to cancel the guarantee completely. Regards Markus Wanner [1]: Using streaming replication as log archiving, Magnus Hagander http://archives.postgresql.org/message-id/aanlkti=_bzsyt8a1kjtpwzxnwyygqnvp1nbjwrnsd...@mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep at Oct 5
On Thu, Oct 7, 2010 at 8:46 PM, Steve Singer ssin...@ca.afilias.info wrote: Then you have to deal with telling the archive how long it needs to keep WAL segments because the master might ask for them back. Yeah, it's not easy to determine how long we should keep the archived WAL files. We need to calculate which WAL file is deletable according to the progress of each standby and the state of the stored base backups which still might be used for PITR. If the archive is remote from the master then you have some extra network copying going on. Yep, so I think that the master (i.e., walsender) should read the archived WAL file by using restore_command specified by users. If the archive is remote from the master, then we would need to specify something like scp in restore_command. Also, even if you compress the archived WAL file by using pg_compress, the master can decompress it by using pg_decompress in restore_command and transfer it. It would be better to let the slave being reconfigured to read the missing WAL from the archive. That's one of choices. But I've heard that some people don't want to set up the shared archive area which can be accessed by the master and the standby. For example, they feel that it's complex to configure NFS server or automatic-scp-without-password setting for sharing the archived WAL files. Currently we have to increase wal_keep_segments to work around that problem. But the pg_xlog disk space is usually small and not suitable to keep many WAL files. So we might be unable to increase wal_keep_segments. If we allow the master to stream WAL files from the archive, we don't need to increase wal_keep_segments and set up such a complex configuration. So this idea is one of useful choices, I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leaky views, yet again
* Robert Haas (robertmh...@gmail.com) wrote: On Thu, Oct 7, 2010 at 2:02 AM, Heikki Linnakangas Looks good. It gives the impression that you need to be able to a create custom function to exploit, though. It would be good to mention that internal functions can be used too, revoking access to CREATE FUNCTION does not make you safe. OK, second try attached. This might be overly pedantic, but I don't think 'tampering' gives the right impression. Also, there's a marked difference between viewing data by using built-ins such as casting (since you'll only get to see the first value in a column that fails the cast) and being able to write a function that pulls out every row of the table and dumps it into another table. I think it'd have a much bigger impression if you went ahead and changed the 'raise notice' to an 'insert into table x;'. Also, even if you can't create functions (due to lack of create privileges on any schema), you could use DO clauses now. Revoking usage rights on all languages should prevent both though. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Git cvsserver serious issue
On 09/23/2010 01:18 PM, Aidan Van Dyk wrote: On Thu, Sep 23, 2010 at 11:49 AM, Tom Lanet...@sss.pgh.pa.us wrote: Magnus Hagandermag...@hagander.net writes: On Thu, Sep 23, 2010 at 17:32, Andrew Dunstanand...@dunslane.net wrote: Are we sure that's going to stop the DOS issue? As long as it's done right, I don't see how it wouldn't. There might be a cleaner way to do it, but after a moment's inspection of the script, I'd be inclined to just hack GITCVS::updater-new() to throw error if $module is neither master nor REL\d_\d_STABLE. Keep in mind of course that I'm a lousy Perl coder. Here's a quick change that will allow you to specifig a modules in the gitcvs section to export, like: [gitcvs] enabled=1 modules=master,REL9_0_STABLE,REL8_4_STABLE --- git-cvsserver.orig 2010-09-23 12:03:06.0 -0400 +++ git-cvsserver 2010-09-23 13:16:53.0 -0400 @@ -2771,6 +2771,12 @@ die Git repo '$self-{git_path}' doesn't exist unless ( -d $self-{git_path} ); +if (defined $cfg-{gitcvs}{modules}) +{ +$log-debug(Limitting modules: . $cfg-{gitcvs}{modules}); +die Invalid module $module unless map {/^ *$module$/} split(',', $cfg-{gitcvs}{modules}); +} + $self-{dbdriver} = $cfg-{gitcvs}{$state-{method}}{dbdriver} || $cfg-{gitcvs}{dbdriver} || SQLite; $self-{dbname} = $cfg-{gitcvs}{$state-{method}}{dbname} || OK, let's go with that. I was kinda hoping that we wouldn't have to do this at all, but Stefan has been having serious problems getting git to build and run on spoonbill, and I don't want to take up more of his time or be without it for very long. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leaky views, yet again
On 07.10.2010 16:10, Stephen Frost wrote: Also, even if you can't create functions (due to lack of create privileges on any schema), you could use DO clauses now. There's no way to shoehorn a DO clause into a SELECT, you can't do: SELECT data FROM view WHERE (DO $$ RAISE NOTICE argument; $$) = 1 -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leaky views, yet again
On Thu, Oct 7, 2010 at 9:10 AM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Thu, Oct 7, 2010 at 2:02 AM, Heikki Linnakangas Looks good. It gives the impression that you need to be able to a create custom function to exploit, though. It would be good to mention that internal functions can be used too, revoking access to CREATE FUNCTION does not make you safe. OK, second try attached. This might be overly pedantic, but I don't think 'tampering' gives the right impression. I'm open to suggestions. Also, there's a marked difference between viewing data by using built-ins such as casting (since you'll only get to see the first value in a column that fails the cast) and being able to write a function that pulls out every row of the table and dumps it into another table. Well, that's why I give the more serious example first. Even with casting failures, there's a good chance you can probe a bunch of different rows by throwing random filter conditions into the clause. (function_that_returns_false() OR (name = 'some_constant' AND number::box) I think it'd have a much bigger impression if you went ahead and changed the 'raise notice' to an 'insert into table x;'. Possibly, but it makes the example slightly longer, and I think it's clear enough as-is. Also, even if you can't create functions (due to lack of create privileges on any schema), you could use DO clauses now. Revoking usage rights on all languages should prevent both though. I don't see how to make it work with a DO clause. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
Markus Wanner mar...@bluegap.ch writes: Why does one ever want the guarantee that sync replication gives to only hold true up to one failure, if a better guarantee doesn't cost anything extra? (Note that a good alerting system is impossible to achieve with only two servers. You need a third device anyway). I think you're all into durability, and that's good. The extra cost is service downtime if that's not what you're after: there's also availability and load balancing read queries on a system with no lag (no stale data servicing) when all is working right. I still think your use case is a solid one, but that we need to be ready to answer to some other ones, that you call relaxed and wrong because of data loss risks. My proposal is to make the risk window obvious and the behavior when you enter it configurable. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep at Oct 5
On Thu, Oct 7, 2010 at 9:08 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 7, 2010 at 8:46 PM, Steve Singer ssin...@ca.afilias.info wrote: Then you have to deal with telling the archive how long it needs to keep WAL segments because the master might ask for them back. Yeah, it's not easy to determine how long we should keep the archived WAL files. We need to calculate which WAL file is deletable according to the progress of each standby and the state of the stored base backups which still might be used for PITR. If the archive is remote from the master then you have some extra network copying going on. Yep, so I think that the master (i.e., walsender) should read the archived WAL file by using restore_command specified by users. If the archive is remote from the master, then we would need to specify something like scp in restore_command. Also, even if you compress the archived WAL file by using pg_compress, the master can decompress it by using pg_decompress in restore_command and transfer it. It would be better to let the slave being reconfigured to read the missing WAL from the archive. That's one of choices. But I've heard that some people don't want to set up the shared archive area which can be accessed by the master and the standby. For example, they feel that it's complex to configure NFS server or automatic-scp-without-password setting for sharing the archived WAL files. Currently we have to increase wal_keep_segments to work around that problem. But the pg_xlog disk space is usually small and not suitable to keep many WAL files. So we might be unable to increase wal_keep_segments. If we allow the master to stream WAL files from the archive, we don't need to increase wal_keep_segments and set up such a complex configuration. So this idea is one of useful choices, I think. I'm not sure anyone other than yourself has endorsed this idea, but in any case it seems off the critical path for getting this feature committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leaky views, yet again
* Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote: On 07.10.2010 16:10, Stephen Frost wrote: Also, even if you can't create functions (due to lack of create privileges on any schema), you could use DO clauses now. There's no way to shoehorn a DO clause into a SELECT, you can't do: SELECT data FROM view WHERE (DO $$ RAISE NOTICE argument; $$) = 1 Wow, I just kind of assumed you could; I'm not really sure why. Perhaps it'll be possible in the future though, so might be something to think about if/when it happens. Can't see a way to abuse the view from inside a DO or in a function in the same way either. Stephen signature.asc Description: Digital signature
Re: [HACKERS] leaky views, yet again
* Robert Haas (robertmh...@gmail.com) wrote: On Thu, Oct 7, 2010 at 9:10 AM, Stephen Frost sfr...@snowman.net wrote: This might be overly pedantic, but I don't think 'tampering' gives the right impression. I'm open to suggestions. Yeah, wasn't coming up with a better word myself. :/ Maybe something with circumvented? Also, even if you can't create functions (due to lack of create privileges on any schema), you could use DO clauses now. Revoking usage rights on all languages should prevent both though. I don't see how to make it work with a DO clause. Yeah, Heikki pointed out that I was assuming PG could work more magic than it can today. :) Sorry for the noise. Stephen signature.asc Description: Digital signature
Re: [HACKERS] On Scalability
On Thu, Oct 7, 2010 at 8:10 AM, Vincenzo Romano vincenzo.rom...@notorand.it wrote: Making these things sub-linear (whether not O(log n) or even O(1) ), provided that there's way to, would make this RDBMS more appealing to enterprises. I mean also partial indexes (as an alternative to table partitioning). Being able to effectively cope with a dozen child tables or so it's more like an amateur feature. If you really need partitioning (or just hierarchical stuff) I think you'll need for quite more than a dozen items. If you partition by just weeks, you'll need 50+ a year. Is there any precise direction to where look into the code for it? Is there a way to put this into a wish list? Well, you can't just arbitrarily turn a O(n) algorithm into an O(lg n) algorithm. I think the most promising approach to scaling to large numbers of partitions is the patch that Itagaki Takahiro was working on back in July. Unfortunately, that patch still needs a lot of work - and some redesign - before it will really meet our needs. Right now, the way to set up partitioning is to create a parent table and then create a bunch of child tables that inherit from them and then put mutually exclusive CHECK constraints on all the children and make sure constraint_exclusion is on so that the planner can notice when not all children need to be scanned. As a totally general architecture, this is probably hard to beat (or to make sublinear). However, if we have DDL that allows the user to say: this is a set of child tables that are range partitions on this key column, with these boundaries, then you should be able to make the constraint exclusion calculations much more efficient, because it won't have to infer so much from first principles. O(lg n) doesn't seem out of the question given that architecture. I think, though, that that is still some way off. If you're in a position to help with (or fund) the coding, it can be made to happen faster, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On Thu, Oct 7, 2010 at 6:32 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Or if the standby is lagging and the master wal_keep_segments is not sized big enough. Is that a catastrophic loss of the standby too? Sure, but that lagged standy is already asynchrounous, not synchrounous. If it was synchronous, it would have slowed the master down enough it would not be lagged. I'm really confused with all this k N scenarious I see bandied about, because, all it really amounts to is I only want *one* syncronous replication, and a bunch of synchrounous replications. And a bit of chance thrown in the mix to hope the syncronous one is pretty stable, the asynchronous ones aren't *too* far behind (define too and far at your leisure). And then I see a lot of posturing about how to recover when the asynchronous standbys aren't synchronous enough at some point... Agreed, that's a nice simple use case. Another one is to say that I want sync rep when the standby is available, but I don't have the budget for more. So I prefer a good alerting system and low-budget-no-guarantee when the standby is down, that's my risk evaluation. That screems wrong in my books: OK, I want durability, so I always want to have 2 copies of the data, but if we loose one, copy, I want to keep on trucking, because I don't *really* want durability. If you want most-of-the time mostly 2 copy durabiltiy, then really good asynchronous replication is a really good solutions. Yes, I believe you need to have a way for an admin (or process/control/config) to be able to demote a synchronous replication scenario into async (or standalone, which is just an extension of really async). But it's no longer syncronous replication at that point. And if the choice is made to keep trucking while a new standby is being brought online and available and caught up, that's fine too. But during that perioud, until the slave is caught up and synchrounously replicating, it's *not* synchronous replication. So I'm not arguing that there shouldn't be a way to turn of synchronous replication once it's on. Hopefully without having to take down the cluster (pg instance type cluster) But I am pleading that there is a way to setup PG such that synchronous replication *is* synchronously replicating, or things stop and backup until such a time as it is. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Git cvsserver serious issue
On Thu, Oct 7, 2010 at 15:16, Andrew Dunstan and...@dunslane.net wrote: On 09/23/2010 01:18 PM, Aidan Van Dyk wrote: On Thu, Sep 23, 2010 at 11:49 AM, Tom Lanet...@sss.pgh.pa.us wrote: Magnus Hagandermag...@hagander.net writes: On Thu, Sep 23, 2010 at 17:32, Andrew Dunstanand...@dunslane.net wrote: Are we sure that's going to stop the DOS issue? As long as it's done right, I don't see how it wouldn't. There might be a cleaner way to do it, but after a moment's inspection of the script, I'd be inclined to just hack GITCVS::updater-new() to throw error if $module is neither master nor REL\d_\d_STABLE. Keep in mind of course that I'm a lousy Perl coder. Here's a quick change that will allow you to specifig a modules in the gitcvs section to export, like: [gitcvs] enabled=1 modules=master,REL9_0_STABLE,REL8_4_STABLE --- git-cvsserver.orig 2010-09-23 12:03:06.0 -0400 +++ git-cvsserver 2010-09-23 13:16:53.0 -0400 @@ -2771,6 +2771,12 @@ die Git repo '$self-{git_path}' doesn't exist unless ( -d $self-{git_path} ); + if (defined $cfg-{gitcvs}{modules}) + { + $log-debug(Limitting modules: . $cfg-{gitcvs}{modules}); + die Invalid module $module unless map {/^ *$module$/} split(',', $cfg-{gitcvs}{modules}); + } + $self-{dbdriver} = $cfg-{gitcvs}{$state-{method}}{dbdriver} || $cfg-{gitcvs}{dbdriver} || SQLite; $self-{dbname} = $cfg-{gitcvs}{$state-{method}}{dbname} || OK, let's go with that. I was kinda hoping that we wouldn't have to do this at all, but Stefan has been having serious problems getting git to build and run on spoonbill, and I don't want to take up more of his time or be without it for very long. Are we doing this *just* for spoonbill? If so, it's a lot easier to just filter-by-IP, so we don't have to maintain a patched version... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 07.10.2010 10:41, Simon Riggs wrote: Constraint exclusion is linear with respect to number of partitions. Why do you say exponential? For some reason I thought the planner needs to check the constraints of the partitions against each other, but you're right, clearly that's not the case. Linear it is. Well, it's really more like O(mn) where m is the number of partitions and n is the number of clauses in the query --- and not only that, but the O() notation is hiding a depressingly high constant factor. And then there are practical problems like failing to exclude partitions as soon as there are any parameters in the query. There's basically no way that we're going to get decent performance for large numbers of partitions as long as we have to resort to theorem-proving to lead us to the correct partition. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
2010/10/7 Robert Haas robertmh...@gmail.com: Well, you can't just arbitrarily turn a O(n) algorithm into an O(lg n) That's trivially true. I was not asking for the recipe to do it. algorithm. I think the most promising approach to scaling to large numbers of partitions is the patch that Itagaki Takahiro was working on back in July. Unfortunately, that patch still needs a lot of work - and some redesign - before it will really meet our needs. Right now, the way to set up partitioning is to create a parent table and then create a bunch of child tables that inherit from them and then put mutually exclusive CHECK constraints on all the children and make sure constraint_exclusion is on so that the planner can notice when not all children need to be scanned. As a totally general architecture, this is probably hard to beat (or to make sublinear). This is exactly what's described into the official documentation. Everyone I ask information about before going deeper in test I get the same answer: don't try to use more than a dozen child tables. However, if we have DDL that allows the user to say: this is a set of child tables that are range partitions on this key column, with these boundaries, then you should be able to make the constraint exclusion calculations much more efficient, because it won't have to infer so much from first principles. O(lg n) doesn't seem out of the question given that architecture. I see the main problem in the way the planner understands which partition is useful and which one is not. Having the DDL supporting the feature could just be syntactic sugar if the underlying mechanism is inadequate. I think, though, that that is still some way off. If you're in a position to help with (or fund) the coding, it can be made to happen faster, of course. This is why I was asking for directions: brwosing the whole code to look for the relevant stuff is quite time consuming. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Git cvsserver serious issue
On 10/07/2010 09:44 AM, Magnus Hagander wrote: On Thu, Oct 7, 2010 at 15:16, Andrew Dunstanand...@dunslane.net wrote: On 09/23/2010 01:18 PM, Aidan Van Dyk wrote: On Thu, Sep 23, 2010 at 11:49 AM, Tom Lanet...@sss.pgh.pa.uswrote: Magnus Hagandermag...@hagander.netwrites: On Thu, Sep 23, 2010 at 17:32, Andrew Dunstanand...@dunslane.net wrote: Are we sure that's going to stop the DOS issue? As long as it's done right, I don't see how it wouldn't. There might be a cleaner way to do it, but after a moment's inspection of the script, I'd be inclined to just hack GITCVS::updater-new() to throw error if $module is neither master nor REL\d_\d_STABLE. Keep in mind of course that I'm a lousy Perl coder. Here's a quick change that will allow you to specifig a modules in the gitcvs section to export, like: [gitcvs] enabled=1 modules=master,REL9_0_STABLE,REL8_4_STABLE --- git-cvsserver.orig 2010-09-23 12:03:06.0 -0400 +++ git-cvsserver 2010-09-23 13:16:53.0 -0400 @@ -2771,6 +2771,12 @@ die Git repo '$self-{git_path}' doesn't exist unless ( -d $self-{git_path} ); +if (defined $cfg-{gitcvs}{modules}) +{ +$log-debug(Limitting modules: . $cfg-{gitcvs}{modules}); +die Invalid module $module unless map {/^ *$module$/} split(',', $cfg-{gitcvs}{modules}); +} + $self-{dbdriver} = $cfg-{gitcvs}{$state-{method}}{dbdriver} || $cfg-{gitcvs}{dbdriver} || SQLite; $self-{dbname} = $cfg-{gitcvs}{$state-{method}}{dbname} || OK, let's go with that. I was kinda hoping that we wouldn't have to do this at all, but Stefan has been having serious problems getting git to build and run on spoonbill, and I don't want to take up more of his time or be without it for very long. Are we doing this *just* for spoonbill? If so, it's a lot easier to just filter-by-IP, so we don't have to maintain a patched version... That's the only one I know of - there could certainly be others - but if we're going to support continued CVS use I want to be able to test it, and I don't have a static IP address. Can you filter by name lookup? I have a dyndns name. OTOH, this patch seems pretty small and simple to maintain. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
Aidan Van Dyk ai...@highrise.ca writes: Sure, but that lagged standy is already asynchrounous, not synchrounous. If it was synchronous, it would have slowed the master down enough it would not be lagged. Agreed, except in the case of a joining standby. But you're saying it better than I do: Yes, I believe you need to have a way for an admin (or process/control/config) to be able to demote a synchronous replication scenario into async (or standalone, which is just an extension of really async). But it's no longer syncronous replication at that point. And if the choice is made to keep trucking while a new standby is being brought online and available and caught up, that's fine too. But during that perioud, until the slave is caught up and synchrounously replicating, it's *not* synchronous replication. That's exactly my point. I think we need to handle the case and make it obvious that this window is a data-loss window where there's no sync rep ongoing, then offer users a choice of behaviour. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Git cvsserver serious issue
On Thu, Oct 7, 2010 at 16:07, Andrew Dunstan and...@dunslane.net wrote: On 10/07/2010 09:44 AM, Magnus Hagander wrote: On Thu, Oct 7, 2010 at 15:16, Andrew Dunstanand...@dunslane.net wrote: On 09/23/2010 01:18 PM, Aidan Van Dyk wrote: On Thu, Sep 23, 2010 at 11:49 AM, Tom Lanet...@sss.pgh.pa.us wrote: Magnus Hagandermag...@hagander.net writes: On Thu, Sep 23, 2010 at 17:32, Andrew Dunstanand...@dunslane.net wrote: Are we sure that's going to stop the DOS issue? As long as it's done right, I don't see how it wouldn't. There might be a cleaner way to do it, but after a moment's inspection of the script, I'd be inclined to just hack GITCVS::updater-new() to throw error if $module is neither master nor REL\d_\d_STABLE. Keep in mind of course that I'm a lousy Perl coder. Here's a quick change that will allow you to specifig a modules in the gitcvs section to export, like: [gitcvs] enabled=1 modules=master,REL9_0_STABLE,REL8_4_STABLE --- git-cvsserver.orig 2010-09-23 12:03:06.0 -0400 +++ git-cvsserver 2010-09-23 13:16:53.0 -0400 @@ -2771,6 +2771,12 @@ die Git repo '$self-{git_path}' doesn't exist unless ( -d $self-{git_path} ); + if (defined $cfg-{gitcvs}{modules}) + { + $log-debug(Limitting modules: . $cfg-{gitcvs}{modules}); + die Invalid module $module unless map {/^ *$module$/} split(',', $cfg-{gitcvs}{modules}); + } + $self-{dbdriver} = $cfg-{gitcvs}{$state-{method}}{dbdriver} || $cfg-{gitcvs}{dbdriver} || SQLite; $self-{dbname} = $cfg-{gitcvs}{$state-{method}}{dbname} || OK, let's go with that. I was kinda hoping that we wouldn't have to do this at all, but Stefan has been having serious problems getting git to build and run on spoonbill, and I don't want to take up more of his time or be without it for very long. Are we doing this *just* for spoonbill? If so, it's a lot easier to just filter-by-IP, so we don't have to maintain a patched version... That's the only one I know of - there could certainly be others - but if we're going to support continued CVS use I want to be able to test it, and I don't have a static IP address. Can you filter by name lookup? I have a dyndns name. No, filtering is by IP. ssh forwarding might work. OTOH, this patch seems pretty small and simple to maintain. True, it is rather small. Does anybody know if there's an automated way to maintain that on freebsd ports, and if so, how that works? I want to be *sure* we can't accidentally upgrade git-cvsserver *without* the patch, since that is a security issue. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
2010/10/7 Tom Lane t...@sss.pgh.pa.us: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 07.10.2010 10:41, Simon Riggs wrote: Constraint exclusion is linear with respect to number of partitions. Why do you say exponential? For some reason I thought the planner needs to check the constraints of the partitions against each other, but you're right, clearly that's not the case. Linear it is. Well, it's really more like O(mn) where m is the number of partitions and n is the number of clauses in the query --- and not only that, but the O() notation is hiding a depressingly high constant factor. And then there are practical problems like failing to exclude partitions as soon as there are any parameters in the query. Does the same considerations apply to partial indexes? I mean, I can replace table partitioning with index partitioning concept. (Well I know it's not really the same). Would then it be the same O(nm) to let the planner choose the right indexes given a certain query? There's basically no way that we're going to get decent performance for large numbers of partitions as long as we have to resort to theorem-proving to lead us to the correct partition. regards, tom lane I'm not sure about MySQL, but Oracle can handle large partitioning. So I would say there's a way to achieve the same goal. -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- cel +393398083886 fix +390823454163 fax +3902700506964 gtalk. vincenzo.rom...@notorand.it skype. notorand.it -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
* Vincenzo Romano (vincenzo.rom...@notorand.it) wrote: I see the main problem in the way the planner understands which partition is useful and which one is not. Having the DDL supporting the feature could just be syntactic sugar if the underlying mechanism is inadequate. I'm pretty sure the point with the DDL would be to have a way for the user to communicate to the planner more understanding about the partitioning, not just to be syntactic sugar. With that additional information, the planner can make a faster and better decision. Stephen signature.asc Description: Digital signature
Re: [HACKERS] Issues with Quorum Commit
On Thu, Oct 7, 2010 at 10:08 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Aidan Van Dyk ai...@highrise.ca writes: Sure, but that lagged standy is already asynchrounous, not synchrounous. If it was synchronous, it would have slowed the master down enough it would not be lagged. Agreed, except in the case of a joining standby. *shrug* The joining standby is still asynchronous at this point. It's not synchronous replication. It's just another ^k of the N slaves serving stale data ;-) But you're saying it better than I do: Yes, I believe you need to have a way for an admin (or process/control/config) to be able to demote a synchronous replication scenario into async (or standalone, which is just an extension of really async). But it's no longer syncronous replication at that point. And if the choice is made to keep trucking while a new standby is being brought online and available and caught up, that's fine too. But during that perioud, until the slave is caught up and synchrounously replicating, it's *not* synchronous replication. That's exactly my point. I think we need to handle the case and make it obvious that this window is a data-loss window where there's no sync rep ongoing, then offer users a choice of behaviour. Again, I'm stating there is *no* choice in synchronous replication. It's *got* to block, otherwise it's not synchronous replication. The choice is if you want synchronous replication or not at that point. And turning it off might be a good (best) choice for for most people. I just want to make sure that: 1) There's now way to *sensibly* think it's still synchronously replicating 2) There is a way to enforce that the commits happening *are* synchronously replicating. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
Vincenzo Romano wrote: I see the main problem in the way the planner understands which partition is useful and which one is not. Having the DDL supporting the feature could just be syntactic sugar if the underlying mechanism is inadequate. You have the order of this backwards. In order to do better than the way the current scheme is implemented, the optimizer needs higher quality metadata about the structure of the partitions to work with. Right now, it's inferring them from the CHECK constraints, which requires the whole theorem-proving bit Tom mentioned. That's never going to get any more algorithmically efficient than it already is. If the DDL that created the partitions also made better quality metadata available about the structure of the partitions, at that point it would be possible to also do better in how the optimizer pruned partitions to consider too. If the list it has was known to be in a particular structured/sorted order, the optimizer could do a binary search to find relevant partitions, rather than the linear scan required right now. Until that work is done, any other improvement attempts are doomed to fail. That's the point Robert was trying to make to you. And the fact Oracle does this is why it's able to scale to high partition counts better than PostgreSQL can. You can read more about the work that was being done here at http://wiki.postgresql.org/wiki/Table_partitioning -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
2010/10/7 Stephen Frost sfr...@snowman.net: * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote: I see the main problem in the way the planner understands which partition is useful and which one is not. Having the DDL supporting the feature could just be syntactic sugar if the underlying mechanism is inadequate. I'm pretty sure the point with the DDL would be to have a way for the user to communicate to the planner more understanding about the partitioning, not just to be syntactic sugar. With that additional information, the planner can make a faster and better decision. Stephen Which kind of information are you thinking about? I think that the stuff you put into the CHECK condition for the table will say it all. Infact there you have not just the column names with relevant values, but the actual expression(s) to be checked, -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
* Vincenzo Romano (vincenzo.rom...@notorand.it) wrote: Which kind of information are you thinking about? I think that the stuff you put into the CHECK condition for the table will say it all. The problem is that CHECK conditions can contain just about anything, hence the planner needs to deal with that possibility. Infact there you have not just the column names with relevant values, but the actual expression(s) to be checked, Yes, that would be the problem. Proving something based on expressions is alot more time consuming and complicated than being explicitly told what goes where. Stephen signature.asc Description: Digital signature
Re: [HACKERS] On Scalability
2010/10/7 Greg Smith g...@2ndquadrant.com: Vincenzo Romano wrote: I see the main problem in the way the planner understands which partition is useful and which one is not. Having the DDL supporting the feature could just be syntactic sugar if the underlying mechanism is inadequate. You have the order of this backwards. In order to do better than the way the current scheme is implemented, the optimizer needs higher quality metadata about the structure of the partitions to work with. Right now, it's inferring them from the CHECK constraints, which requires the whole theorem-proving bit Tom mentioned. That's never going to get any more algorithmically efficient than it already is. If the DDL that created the partitions also made better quality metadata available about the structure of the partitions, at that point it would be possible to also do better in how the optimizer pruned partitions to consider too. If the list it has was known to be in a particular structured/sorted order, the optimizer could do a binary search to find relevant partitions, rather than the linear scan required right now. Do you mean the check constraint is used as plain text to be (somehow) executed? If this is the case, then you (all) are perfectly and obviously right and I'm just fishing for bicycles in the sea. I would expect a parser to ... ehm ... parse the CHECK constraint expression at CREATE TABLE time and extract all the needed high quality metadata, like the list of columns involved and the type of checks (range, value list, etc.). The same would be useful for partial indexes, as well. But maybe this is just wishful thinking. Until that work is done, any other improvement attempts are doomed to fail. That's the point Robert was trying to make to you. And the fact Oracle does this is why it's able to scale to high partition counts better than PostgreSQL can. You can read more about the work that was being done here at http://wiki.postgresql.org/wiki/Table_partitioning Done. As well as the official documentation. The point is that there are no hints on the topic. There should be a caveat in the documentation saying that partitioning is not scalable. As well as partial indexing. Thanks so far for the information. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
Excerpts from Vincenzo Romano's message of jue oct 07 10:44:34 -0400 2010: Do you mean the check constraint is used as plain text to be (somehow) executed? If this is the case, then you (all) are perfectly and obviously right and I'm just fishing for bicycles in the sea. Yeah, hence this thread hasn't advanced things very much in any useful direction. That we need to improve the partitioning implementation is already known. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
2010/10/7 Alvaro Herrera alvhe...@commandprompt.com: Excerpts from Vincenzo Romano's message of jue oct 07 10:44:34 -0400 2010: Do you mean the check constraint is used as plain text to be (somehow) executed? If this is the case, then you (all) are perfectly and obviously right and I'm just fishing for bicycles in the sea. Yeah, hence this thread hasn't advanced things very much in any useful direction. That we need to improve the partitioning implementation is already known. Maybe I'm willing to help and possibly able to. But I need to understand things that are already known but I didn't know yet. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
2010/10/7 Stephen Frost sfr...@snowman.net: * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote: Which kind of information are you thinking about? I think that the stuff you put into the CHECK condition for the table will say it all. The problem is that CHECK conditions can contain just about anything, hence the planner needs to deal with that possibility. Not really. For partitioning there would be some constraints as you have in the DEFAULT values. Infact there you have not just the column names with relevant values, but the actual expression(s) to be checked, Yes, that would be the problem. Proving something based on expressions is alot more time consuming and complicated than being explicitly told what goes where. Consuming computing resources at DDL-time should be OK if that will lead to big savings at DML-time (run-time), my opinion. It'd be just like compile time optimizations. Stephen -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkyt3qMACgkQrzgMPqB3kiih3wCcCwLlvpDCjgG5LSgim/XGieEE MsEAn0mHfAizDOpvepGXWTWlxHtJibA5 =Szx4 -END PGP SIGNATURE- -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] O_DSYNC broken on MacOS X?
On Tue, Oct 5, 2010 at 8:11 AM, Peter Eisentraut pete...@gmx.net wrote: On mån, 2010-10-04 at 23:41 -0400, Robert Haas wrote: Well, it's not really useful, but that's how it works everywhere. On Linux, fsync carries the stuff from the kernel's RAM to the disk controller's RAM, and then it depends on some hdparm magic or something what happens next. That's a bit vaguer than I'd like. TFD says The aim of WAL is to ensure that the log is written before database records are altered, but this can be subverted by disk drives that falsely report a successful write to the kernel, when in fact they have only cached the data and not yet stored it on the disk. A power failure in such a situation might lead to irrecoverable data corruption. Administrators should try to ensure that disks holding PostgreSQL's WAL log files do not make such false reports. This leaves open the question of how they should attempt to do this; we should say what we know about that. That is explained in section 29.1 Reliability. I also notice the following sentence in our documentation, which now appears to me to be flat-out wrong: The wal_sync_method parameter determines how PostgreSQL will ask the kernel to force WAL updates out to disk. All the options should be the same in terms of reliability, but it's quite platform-specific which one will be the fastest. Obviously, we know now (if we didn't before) that this isn't the case, per my OP. Right. It was true before fsync_writethrough was invented. Proposed doc patch attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company document-wal-caveats.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
* Vincenzo Romano (vincenzo.rom...@notorand.it) wrote: I would expect a parser to ... ehm ... parse the CHECK constraint expression at CREATE TABLE time and extract all the needed high quality metadata, like the list of columns involved and the type of checks (range, value list, etc.). Check constraints can be added after the table is created. Inheiritance can be added/changed independently of check constraints. Hacking all of the inheiritance, check constraint creation, and any other possibly involved code paths to try to figure out if this particular table, check constraint, inheiritance relationship, etc, is part of a partitioning setup isn't exactly trivial, or the right approach. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] On Scalability
* Vincenzo Romano (vincenzo.rom...@notorand.it) wrote: 2010/10/7 Stephen Frost sfr...@snowman.net: * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote: The problem is that CHECK conditions can contain just about anything, hence the planner needs to deal with that possibility. Not really. For partitioning there would be some constraints as you have in the DEFAULT values. How do we know when it's partitioning and not a CHECK constraint being used for something else..? I'll tell you- through the user using specific partitioning DDL statements. Consuming computing resources at DDL-time should be OK if that will lead to big savings at DML-time (run-time), my opinion. It'd be just like compile time optimizations. CHECK constraints, inheiritance, etc, are general things which can be used for more than just partitioning. Abusing them to go through tons of extra gyrations to make the specific partitioning case faster at DML time (if that's really even possible... I'm not convinced you could make it bullet-proof) isn't a good approach. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] security hook on table creation
On Wed, Oct 6, 2010 at 5:21 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of mié oct 06 17:02:22 -0400 2010: 2010/10/5 KaiGai Kohei kai...@ak.jp.nec.com: However, we also have a few headache cases. DefineType() creates a new type object and its array type, but it does not call CommandCounterIncrement() by the end of this function, so the new type entries are not visible from the plugin modules, even if we put a security hook at tail of the DefineType(). DefineFunction() also has same matter. It create a new procedure object, but it also does not call CommandCounterIncrement() by the end of this function, except for the case when ProcedureCreate() invokes language validator function. So I guess the first question here is why it's important to be able to see the new entry. I am thinking that you want it so that, for example, you can fetch the namespace OID to perform an SE-Linux type transition. Is that right? I'm not sure that there's any point trying to optimize these to the point of avoiding CommandCounterIncrement. Surely DefineType et al are not performance-sensitive operations. OK, fair enough. Let's just do it unconditionally then. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
2010/10/7 Stephen Frost sfr...@snowman.net: * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote: 2010/10/7 Stephen Frost sfr...@snowman.net: * Vincenzo Romano (vincenzo.rom...@notorand .it) wrote: The problem is that CHECK conditions can contain just about anything, hence the planner needs to deal with that possibility. Not really. For partitioning there would be some constraints as you have in the DEFAULT values. How do we know when it's partitioning and not a CHECK constraint being used for something else..? Why asking? You don't need to tell them apart. Just parse the expression, extract the metadata to be used when the expression need to be evaluated. Being it a plain CHECK constraint or something for the partition management would then be irrelevant. I'll tell you- through the user using specific partitioning DDL statements. That could be the next step, once the underlying stuff is already in place. Consuming computing resources at DDL-time should be OK if that will lead to big savings at DML-time (run-time), my opinion. It'd be just like compile time optimizations. CHECK constraints, inheiritance, etc, are general things which can be used for more than just partitioning. Abusing them to go through tons of extra gyrations to make the specific partitioning case faster at DML time (if that's really even possible... I'm not convinced you could make it bullet-proof) isn't a good approach. At the moment I'm not interested in particular cases. I think that CHECK constraints (as well as partial indexes expressions) should be handled in a more effective way. Better partitioning (both for tables and indexes) would be a side effect. Thanks for the insights. Thanks, Stephen -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkyt5CAACgkQrzgMPqB3kijAUACfd9QcB00Nic6mSwWmwoXABc4p kBoAnAijF39ZTFOGjpk1CN/8/I3Tj9HI =C8G/ -END PGP SIGNATURE- -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
2010/10/7 Stephen Frost sfr...@snowman.net: * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote: I would expect a parser to ... ehm ... parse the CHECK constraint expression at CREATE TABLE time and extract all the needed high quality metadata, like the list of columns involved and the type of checks (range, value list, etc.). Check constraints can be added after the table is created. Inheiritance can be added/changed independently of check constraints. Hacking all of the inheiritance, check constraint creation, and any other possibly involved code paths to try to figure out if this particular table, check constraint, inheiritance relationship, etc, is part of a partitioning setup isn't exactly trivial, or the right approach. Thanks, Stephen I think none will say things are trivial. So, what'd be the right approach in your vision? I mean, if you think about partitioning a-la Oracle, then you'll have to parse those expressions anyway. -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- cel +393398083886 fix +390823454163 fax +3902700506964 gtalk. vincenzo.rom...@notorand.it skype. notorand.it -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
Vincenzo Romano vincenzo.rom...@notorand.it wrote: 2010/10/7 Stephen Frost sfr...@snowman.net: Yes, that would be the problem. Proving something based on expressions is alot more time consuming and complicated than being explicitly told what goes where. Consuming computing resources at DDL-time should be OK if that will lead to big savings at DML-time (run-time), my opinion. It'd be just like compile time optimizations. I think something you haven't entirely grasped is how pluggable PostgreSQL is -- you can not only define your own functions in a wide variety of languages (including C), but your own data types, operators, casts, index strategies, etc. Determining, even at DDL time that even a built-in datatype's expression is or isn't useful in partitioning could be quite painful in the absence of syntax specifically geared toward partitioning. If there's a CHECK constraint on a polygon column to ensure that it isn't a concave polygon, you might be facing a lot of work to know whether it's involved in partitioning. Now imagine that a CHECK constraint is on a column with a user defined type and uses the @%!! operator and that the user has changed some of the allowed implicit casts used in the expression. While this flexibility is a great strength of PostgreSQL, it makes some things more difficult to implement than they would be in more limited database products. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
* Vincenzo Romano (vincenzo.rom...@notorand.it) wrote: So, what'd be the right approach in your vision? Have you read http://wiki.postgresql.org/wiki/Table_partitioning and the various places it links to..? I mean, if you think about partitioning a-la Oracle, then you'll have to parse those expressions anyway. Oracle's approach is discussed there. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Issues with Quorum Commit
Aidan Van Dyk ai...@highrise.ca writes: *shrug* The joining standby is still asynchronous at this point. It's not synchronous replication. It's just another ^k of the N slaves serving stale data ;-) Agreed *here*, but if you read the threads again, you'll see that's not at all what's been talked about before my proposal. In particular, the questions about how to unlock a master's setup while its synced standby is doing a base backup should not be allowed to exists, and you seem to agree with my point. That's exactly my point. I think we need to handle the case and make it obvious that this window is a data-loss window where there's no sync rep ongoing, then offer users a choice of behaviour. Again, I'm stating there is *no* choice in synchronous replication. It's *got* to block, otherwise it's not synchronous replication. The choice is if you want synchronous replication or not at that point. Exactly, even if I didn't dare spell it this way. What I want to propose is for the user to be able to configure things so that he loses the sync aspect of the replication if it so happens that the setup is not able to provide for it. It may sound strange, but it's needed when all you want is a no stale data reporting stanbdy, e.g. And it so happens that it's already in Simon's code, AFAIUI (yet to read it, see). And turning it off might be a good (best) choice for for most people. I just want to make sure that: 1) There's now way to *sensibly* think it's still synchronously replicating 2) There is a way to enforce that the commits happening *are* synchronously replicating. We're on the same track. I don't know how to offer your options without a clear listing of standby states and transitions, which must include the synchronicity and whether you just lost it or whatnot. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] O_DSYNC broken on MacOS X?
Robert Haas wrote: Proposed doc patch attached. Looks accurate to me. I like the additional linking to the Reliability page you put in there too. Heavily referencing that important page from related areas is a good thing, particularly now that it's got a lot more details than it used to. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] O_DSYNC broken on MacOS X?
Robert Haas robertmh...@gmail.com writes: Proposed doc patch attached. discusesed? Otherwise +1 regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] O_DSYNC broken on MacOS X?
On Thu, Oct 7, 2010 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Proposed doc patch attached. discusesed? Otherwise +1 Woops, thanks. Committed with that change. I back-patched it back to 8.3, which is as far as it applied with only minor conflicts. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] O_DSYNC broken on MacOS X?
On Thu, Oct 7, 2010 at 11:45 AM, Greg Smith g...@2ndquadrant.com wrote: Robert Haas wrote: Proposed doc patch attached. Looks accurate to me. I like the additional linking to the Reliability page you put in there too. Heavily referencing that important page from related areas is a good thing, particularly now that it's got a lot more details than it used to. Cool, thanks for the fast review. I suspect there are more details that could stand to be added to the WAL reliability page as well, but I don't know what they are so I can't add them. I still have the feeling that we have not put quite a large enough red, blinking light around this issue, but I don't have a concrete suggestion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
Markus Wanner wrote: I think that's a pretty special case, because the good alerting system is at least as expensive as another server that just persistently stores and ACKs incoming WAL. The cost of hardware capable of running a database server is a large multiple of what you can build an alerting machine for. I have two systems that are approaching the trash heap just at my house, relative to the main work I do, but that are fully capable of running an alerting system. Building a production quality database server requires a more significant investment: high quality disks, ECC RAM, battery-backed RAID controller, etc. Relative to what the hardware in a database server costs, what you need to build an alerting system is almost free. Oh: and most businesses that are complicated enough to need a serious database server already have them, so they actually cost nothing beyond the software setup time to point them toward the databases, too. Why does one ever want the guarantee that sync replication gives to only hold true up to one failure, if a better guarantee doesn't cost anything extra? (Note that a good alerting system is impossible to achieve with only two servers. You need a third device anyway). I do not disagree with your theory or reasoning. But as a practical matter, I'm afraid the true cost of the better guarantee you're suggesting here is additional code complexity that will likely cause this feature to miss 9.1 altogether. As far as I'm concerned, this whole diversion into the topic of quorum commit is only consuming resources away from targeting something achievable in the time frame of a single release. Sync replication between really just two servers is asking for trouble and certainly not worth the savings in hardware cost. Better invest in a good UPS and redundant power supplies for a single server. I wish I could give you the long list of data recovery projects I've worked on over the last few years, so you could really appreciate how much what you're saying here is exactly the opposite of the reality here. You cannot make a single server reliable enough to survive all of the things that Murphy's Law will inflict upon it, at any price. For most of the businesses I work with who want sync rep, data is not considered safe until the second copy is on storage miles away from the original, because they know this too. Personal anecdote I can share: I used to have an important project related to stock trading where I kept my backup system about 50 miles away from me. I was aiming for constant availability, while still being able to drive to the other server if needed for disaster recovery. Guess what? Even those two turned out not to be nearly independent enough; see http://en.wikipedia.org/wiki/Northeast_Blackout_of_2003 for details of how I lost both of those at the same time for days. Silly me, I'd only spread them across two adjacent states with different power providers! Not nearly good enough to avoid a correlated failure. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
2010/10/7 Stephen Frost sfr...@snowman.net: * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote: So, what'd be the right approach in your vision? Have you read http://wiki.postgresql.org/wiki/Table_partitioning and the various places it links to..? I mean, if you think about partitioning a-la Oracle, then you'll have to parse those expressions anyway. Oracle's approach is discussed there. I didn't meant the implementation, but the goals achieved. Thanks, Stephen -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkyt6ikACgkQrzgMPqB3kih0HwCcD8rQQhD6oXao8ZnG/bMAvx2d 4HkAnjrzox4XemzVyFkhKRXb3ZjS2nba =6WlP -END PGP SIGNATURE- -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] O_DSYNC broken on MacOS X?
On Oct 7, 2010, at 12:26 PM, Robert Haas wrote: On Thu, Oct 7, 2010 at 11:45 AM, Greg Smith g...@2ndquadrant.com wrote: Robert Haas wrote: Proposed doc patch attached. Looks accurate to me. I like the additional linking to the Reliability page you put in there too. Heavily referencing that important page from related areas is a good thing, particularly now that it's got a lot more details than it used to. Cool, thanks for the fast review. I suspect there are more details that could stand to be added to the WAL reliability page as well, but I don't know what they are so I can't add them. I still have the feeling that we have not put quite a large enough red, blinking light around this issue, but I don't have a concrete suggestion. I think the general problem is that there is no simple way to verify that a PostgreSQL commit is pushing the bits to persistent storage. It would be helpful if there were a platform-specific, volume-specific tool to deduce this. Currently, there is no warning light that goes on when commits are not persistent. On Linux, a tool could check filesystem parameters, hdparm (if relevant), and hard drive and controller specs (possibly against a blacklist of known liars). Perhaps a simpler tool could run a basic fsyncs-per-second test and prompt the DBA to check that the numbers are within the realm of possibility. How else can a DBA today ensure that a commit is a commit? Cheers, M -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On 10/7/10 6:41 AM, Aidan Van Dyk wrote: I'm really confused with all this k N scenarious I see bandied about, because, all it really amounts to is I only want *one* syncronous replication, and a bunch of synchrounous replications. And a bit of chance thrown in the mix to hope the syncronous one is pretty stable, the asynchronous ones aren't *too* far behind (define too and far at your leisure). Effectively, yes. The the difference between k of N synch rep and 1 synch standby + several async standbys is that in k of N, you have a pool and aren't dependent on having a specific standby be very reliable, just that any one of them is. So if you have k = 3 and N = 10, then you can have 10 standbys and only 3 of them need to ack any specific commit for the master to proceed. As long as (a) you retain at least one of the 3 which ack'd, and (b) you have some way of determining which standby is the most caught up, data loss is fairly unlikely; you'd need to lose 4 of the 10, and the wrong 4, to lose data. The advantage of this for availability over just having k = N = 3 comes when one of the standbys is responding slowly (due to traffic) or goes offline unexpectedly due to a hardware failure. In the k = N = 3 case, the system halts. In the k = 3, N = 10 case, you can lose up to 7 standbys without the system going down. It's notable that the massively scalable transactional databases (Dynamo, Cassandra, various telecom databases, etc.) all operate this way. However, I do consider this advanced functionality and not worth pursuing until we have the k = 1 case implemented and well-tested. For comparison, Cassandra, Hypertable and Riak have been working on their k N functionality for a couple years now and none of them has it stable *and* fast. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] standby registration (was: is sync rep stalled?)
On 06.10.2010 19:26, Greg Smith wrote: Now, the more relevant question, what I actually need in order for a Sync Rep feature in 9.1 to be useful to the people who want it most I talk to. That would be a simple to configure setup where I list a subset of important nodes, and the appropriate acknowledgement level I want to hear from one of them. And when one of those nodes gives that acknowledgement, commit on the master happens too. That's it. For use cases like the commonly discussed two local/two remote situation, the two remote ones would be listed as the important ones. This feels like the best way forward to me. It gives some flexibility, and doesn't need a new config file. Let me check that I got this right, and add some details to make it more concrete: Each standby is given a name. It can be something like boston1 or testserver. It does *not* have to be unique across all standby servers. In the master, you have a list of important, synchronous, nodes that must acknowledge each commit before it is acknowledged to the client. The standby name is a GUC in the standby's configuration file: standby_name='bostonserver' The list of important nodes is also a GUC, in the master's configuration file: synchronous_standbys='bostonserver, oxfordserver' To configure for a simple setup with a master and one synchronous standby (which is not a very good setup from availability point of view, as discussed to death), you give the standby a name, and put the same name in synchronous_standbys in the master. To configure a setup with a master and two standbys, so that a commit is acknowledged to client as soon as either one of the standbys acknowledge it, you give both standbys the same name, and the same name in synchronous_standbys list in the master. This is the configuration that gives zero data loss in case one server fails, but also caters for availability because you don't need to halt the master if one standby fails. To configure a setup with a master and two standbys, so that a commit is acknowledged to client after *both* standbys acknowledge it, you give both standbys a different name, and list both names in synchronous_standbys_list in the master. I believe this will bend to most real life scenarios people have. Now, the other big fight is over wait forever vs timeout. Personally, I'm stand firmly in the wait forever camp - you're nuts if you want a timeout. However, I can see that not everyone agrees :-). Fortunately, once we have robust wait forever behavior, it shouldn't be hard at all to add a timeout option on top of that, for those who want it. We should be able to have both options in 9.1. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] standby registration (was: is sync rep stalled?)
On 10/7/10, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.10.2010 19:26, Greg Smith wrote: Now, the more relevant question, what I actually need in order for a Sync Rep feature in 9.1 to be useful to the people who want it most I talk to. That would be a simple to configure setup where I list a subset of important nodes, and the appropriate acknowledgement level I want to hear from one of them. And when one of those nodes gives that acknowledgement, commit on the master happens too. That's it. For use cases like the commonly discussed two local/two remote situation, the two remote ones would be listed as the important ones. This feels like the best way forward to me. It gives some flexibility, and doesn't need a new config file. Let me check that I got this right, and add some details to make it more concrete: Each standby is given a name. It can be something like boston1 or testserver. It does *not* have to be unique across all standby servers. In the master, you have a list of important, synchronous, nodes that must acknowledge each commit before it is acknowledged to the client. The standby name is a GUC in the standby's configuration file: standby_name='bostonserver' The list of important nodes is also a GUC, in the master's configuration file: synchronous_standbys='bostonserver, oxfordserver' To configure for a simple setup with a master and one synchronous standby (which is not a very good setup from availability point of view, as discussed to death), you give the standby a name, and put the same name in synchronous_standbys in the master. To configure a setup with a master and two standbys, so that a commit is acknowledged to client as soon as either one of the standbys acknowledge it, you give both standbys the same name, and the same name in synchronous_standbys list in the master. This is the configuration that gives zero data loss in case one server fails, but also caters for availability because you don't need to halt the master if one standby fails. To configure a setup with a master and two standbys, so that a commit is acknowledged to client after *both* standbys acknowledge it, you give both standbys a different name, and list both names in synchronous_standbys_list in the master. I believe this will bend to most real life scenarios people have. +1. I think this would have met any needs of mine in my past life as a sysadmin/dba. Now, the other big fight is over wait forever vs timeout. Personally, I'm stand firmly in the wait forever camp - you're nuts if you want a timeout. However, I can see that not everyone agrees :-). Fortunately, once we have robust wait forever behavior, it shouldn't be hard at all to add a timeout option on top of that, for those who want it. We should be able to have both options I disagree that you're nuts if you want this feature fwiw. +1 on your suggested plan though :-) /D -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On Thu, Oct 7, 2010 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: So if you have k = 3 and N = 10, then you can have 10 standbys and only 3 of them need to ack any specific commit for the master to proceed. As long as (a) you retain at least one of the 3 which ack'd, and (b) you have some way of determining which standby is the most caught up, data loss is fairly unlikely; you'd need to lose 4 of the 10, and the wrong 4, to lose data. The advantage of this for availability over just having k = N = 3 comes when one of the standbys is responding slowly (due to traffic) or goes offline unexpectedly due to a hardware failure. In the k = N = 3 case, the system halts. In the k = 3, N = 10 case, you can lose up to 7 standbys without the system going down. Sure, but here is where I might not be following. If you want synchronous replication because you want query availabilty while making sure you're not getting stale queries from all your slaves, than using your k N (k = 3 and N - 10) situation is screwing your self. To get non-stale responses, you can only query those k=3 servers. But you've shot your self in the foot because you don't know which 3/10 those will be. The other 7 *are* stale (by definition). They talk about picking the caught up slave when the master fails, but you actually need to do that for *every query*. If you say they are pretty close so by the time you get the query to them they will be caught up, well then, all you really want is good async replication, you don't really *need* the synchronous part. The only case I see a race to quorum type of k N being useful is if you're just trying to duplicate data everywhere, but not actually querying any of the replicas. I can see that all queries go to the master, but the chances are pretty high the multiple machines are going to fail so I want multiple replicas being useful, but I *don't* think that's what most people are wanting in their I want 3 of 10 servers to ack the commit. The difference between good async and sync is only the *guarentee*. If you don't need the guarantee, you don't need the synchronous part. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] standby registration (was: is sync rep stalled?)
On 10/7/10 10:27 AM, Heikki Linnakangas wrote: The standby name is a GUC in the standby's configuration file: standby_name='bostonserver' The list of important nodes is also a GUC, in the master's configuration file: synchronous_standbys='bostonserver, oxfordserver' This seems to abandon Simon's concept of per-transaction synchronization control. That seems like such a potentially useful feature that I'm reluctant to abandon it just for administrative elegance. Does this work together with that in some way I can't see? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
If you want synchronous replication because you want query availabilty while making sure you're not getting stale queries from all your slaves, than using your k N (k = 3 and N - 10) situation is screwing your self. Correct. If that is your reason for synch standby, then you should be using k = N configuration. However, some people are willing to sacrifice consistency for durability and availability. We should give them that option (eventually), since among that triad you can never have more than two. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On 10/07/2010 06:41 PM, Greg Smith wrote: The cost of hardware capable of running a database server is a large multiple of what you can build an alerting machine for. You realize you don't need lots of disks nor RAM for a box that only ACKs? A box with two SAS disks and a BBU isn't that expensive anymore. I do not disagree with your theory or reasoning. But as a practical matter, I'm afraid the true cost of the better guarantee you're suggesting here is additional code complexity that will likely cause this feature to miss 9.1 altogether. As far as I'm concerned, this whole diversion into the topic of quorum commit is only consuming resources away from targeting something achievable in the time frame of a single release. So far I've been under the impression that Simon already has the code for quorum_commit k = 1. What I'm opposing to is the timeout feature, which I consider to be additional code, unneeded complexity and foot-gun. You cannot make a single server reliable enough to survive all of the things that Murphy's Law will inflict upon it, at any price. That's exactly what I'm saying applies to two servers as well. And why a timeout is a bad thing here, because the chance the second nodes fails as well is there (and is higher than you think, according to Murphy). For most of the businesses I work with who want sync rep, data is not considered safe until the second copy is on storage miles away from the original, because they know this too. Now, that are the people who really need sync rep, yes. What do you think how happy those businesses were to find out that Postgres is cheating on them in case of a network outage, for example? Do they really value (write!) availability more than data safety? Silly me, I'd only spread them across two adjacent states with different power providers! Not nearly good enough to avoid a correlated failure. Thanks for sharing this. I hope you didn't loose data. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] standby registration (was: is sync rep stalled?)
On Thu, Oct 7, 2010 at 1:39 PM, Dave Page dp...@pgadmin.org wrote: On 10/7/10, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.10.2010 19:26, Greg Smith wrote: Now, the more relevant question, what I actually need in order for a Sync Rep feature in 9.1 to be useful to the people who want it most I talk to. That would be a simple to configure setup where I list a subset of important nodes, and the appropriate acknowledgement level I want to hear from one of them. And when one of those nodes gives that acknowledgement, commit on the master happens too. That's it. For use cases like the commonly discussed two local/two remote situation, the two remote ones would be listed as the important ones. This feels like the best way forward to me. It gives some flexibility, and doesn't need a new config file. Let me check that I got this right, and add some details to make it more concrete: Each standby is given a name. It can be something like boston1 or testserver. It does *not* have to be unique across all standby servers. In the master, you have a list of important, synchronous, nodes that must acknowledge each commit before it is acknowledged to the client. The standby name is a GUC in the standby's configuration file: standby_name='bostonserver' The list of important nodes is also a GUC, in the master's configuration file: synchronous_standbys='bostonserver, oxfordserver' To configure for a simple setup with a master and one synchronous standby (which is not a very good setup from availability point of view, as discussed to death), you give the standby a name, and put the same name in synchronous_standbys in the master. To configure a setup with a master and two standbys, so that a commit is acknowledged to client as soon as either one of the standbys acknowledge it, you give both standbys the same name, and the same name in synchronous_standbys list in the master. This is the configuration that gives zero data loss in case one server fails, but also caters for availability because you don't need to halt the master if one standby fails. To configure a setup with a master and two standbys, so that a commit is acknowledged to client after *both* standbys acknowledge it, you give both standbys a different name, and list both names in synchronous_standbys_list in the master. I believe this will bend to most real life scenarios people have. +1. I think this would have met any needs of mine in my past life as a sysadmin/dba. Before we get too far down the garden path here, this is actually substantially more complicated than what Greg proposed. Greg was proposing, as have some other folks I think, to focus only on the k=1 case - in other words, only one acknowledgment would ever be required for any given commit. I think he's right to focus on that case, because the multiple-ACKs-required solutions are quite a bit hairier. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] standby registration (was: is sync rep stalled?)
On Thu, Oct 7, 2010 at 1:45 PM, Josh Berkus j...@agliodbs.com wrote: On 10/7/10 10:27 AM, Heikki Linnakangas wrote: The standby name is a GUC in the standby's configuration file: standby_name='bostonserver' The list of important nodes is also a GUC, in the master's configuration file: synchronous_standbys='bostonserver, oxfordserver' This seems to abandon Simon's concept of per-transaction synchronization control. That seems like such a potentially useful feature that I'm reluctant to abandon it just for administrative elegance. Does this work together with that in some way I can't see? I think they work together fine. Greg's idea is that you list the important standbys, and a synchronization guarantee that you'd like to have for at least one of them. Simon's idea - at least at 10,000 feet - is that you can take a pass on that guarantee for transactions that don't need it. I don't see why you can't have both. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
But as a practical matter, I'm afraid the true cost of the better guarantee you're suggesting here is additional code complexity that will likely cause this feature to miss 9.1 altogether. As far as I'm concerned, this whole diversion into the topic of quorum commit is only consuming resources away from targeting something achievable in the time frame of a single release. Yes. My purpose in starting this thread was to show that k 1 quorum commit is considerably more complex than the people who have been bringing it up in other threads seem to think it is. It is not achievable for 9.1, and maybe not even for 9.2. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Issues with two-server Synch Rep
Simon, Fujii, What follows are what I see as the major issues with making two-server synch replication work well. I would like to have you each answer them, explaining how your patch and your design addresses each issue. I believe this will go a long way towards helping the majority of the community understand the options we have from your code, as well as where help is still needed. Adding a Synch Standby --- What is the procedure for adding a new synchronous standby in your implementation? That is, how do we go from having a standby server with an empty PGDATA to having a working synchronous standby? Snapshot Publication - During 9.0 development discussion, one of the things we realized we needed for synch standby was publication of snapshots back to the master in order to prevent query cancel on the standby. Without this, the synch standby is useless for running read queries. Does your patch implement this? Please describe. Management --- One of the serious flaws currently in HS/SR is complexity of administration. Setting up and configuring even a single master and single standby requires editing up to 6 configuration files in Postgres, as well as dealing with file permissions. As such, any Synch Rep patch must work together with attempts to simplify administration. How does your design do this? Monitoring --- Synch rep offers severe penalties to availability if a synch standby gets behind or goes down. What replication-specific monitoring tools and hooks are available to allow administators to take action before the database becomes unavailable? Degradation In the event that the synch rep standby falls too far behind or becomes unavailable, or is deliberately taken offline, what are you envisioning as the process for the DBA resolving the situation? Is there any ability to commit stuck transactions? Client Consistency - With a standby in apply mode, and a master failure at the wrong time, there is the possibility that the Standby will apply a transaction at the same time that the master crashes, causing the client to never receive a commit message. Once the client reconnects to the standby, how will it know whether its transaction was committed or not? As a lesser case, a standby in apply mode will show the results of committed transactions *before* they are visible on the master. Is there any need to handle this? If so, how? Performance As with XA, synch rep has the potential to be so slow as to be unusable. What optimizations to you make in your approach to synch rep to make it faster than two-phase commit? What other performance optimizations have you added? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] standby registration (was: is sync rep stalled?)
On Thu, Oct 7, 2010 at 1:27 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Let me check that I got this right, and add some details to make it more concrete: Each standby is given a name. It can be something like boston1 or testserver. It does *not* have to be unique across all standby servers. In the master, you have a list of important, synchronous, nodes that must acknowledge each commit before it is acknowledged to the client. The standby name is a GUC in the standby's configuration file: standby_name='bostonserver' The list of important nodes is also a GUC, in the master's configuration file: synchronous_standbys='bostonserver, oxfordserver' +1. It definitely covers the scenarios I want. And even allows the ones I don't want, and don't understand either ;-) I and personally, I'ld *love* it if the streaming replication protocol was adjusted to that every streaming WAL client reported back their role and recive/fsync/replay positions as part of the protocol (allowing role and positions to be something NULLable/empty/0). I think Simon demonstrated that the overhead to report it isn't high. Again, in the deployments I'm wanting, the slave isn't a PG server, but something like Magnus's stream-to-archive, so I can't query the slave to see how far behind it is. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
Aidan Van Dyk ai...@highrise.ca wrote: To get non-stale responses, you can only query those k=3 servers. But you've shot your self in the foot because you don't know which 3/10 those will be. The other 7 *are* stale (by definition). They talk about picking the caught up slave when the master fails, but you actually need to do that for *every query*. With web applications, at least, you often don't care that the data read is absolutely up-to-date, as long as the point in time doesn't jump around from one request to the next. When we have used load balancing between multiple database servers (which has actually become unnecessary for us lately because PostgreSQL has gotten so darned fast!), we have established affinity between a session and one of the database servers, so that if they became slightly out of sync, data would not pop in and out of existence arbitrarily. I think a reasonable person could combine this technique with a 3 of 10 synchronous replication quorum to get both safe persistence of data and reasonable performance. I can also envision use cases where this would not be desirable. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On Thu, Oct 7, 2010 at 2:10 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Aidan Van Dyk ai...@highrise.ca wrote: To get non-stale responses, you can only query those k=3 servers. But you've shot your self in the foot because you don't know which 3/10 those will be. The other 7 *are* stale (by definition). They talk about picking the caught up slave when the master fails, but you actually need to do that for *every query*. With web applications, at least, you often don't care that the data read is absolutely up-to-date, as long as the point in time doesn't jump around from one request to the next. When we have used load balancing between multiple database servers (which has actually become unnecessary for us lately because PostgreSQL has gotten so darned fast!), we have established affinity between a session and one of the database servers, so that if they became slightly out of sync, data would not pop in and out of existence arbitrarily. I think a reasonable person could combine this technique with a 3 of 10 synchronous replication quorum to get both safe persistence of data and reasonable performance. I can also envision use cases where this would not be desirable. Well, keep in mind all updates have to be done on the single master. That works pretty well for fine-grained replication, but I don't think it's very good for full-cluster replication. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
Robert Haas robertmh...@gmail.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: With web applications, at least, you often don't care that the data read is absolutely up-to-date, as long as the point in time doesn't jump around from one request to the next. When we have used load balancing between multiple database servers (which has actually become unnecessary for us lately because PostgreSQL has gotten so darned fast!), we have established affinity between a session and one of the database servers, so that if they became slightly out of sync, data would not pop in and out of existence arbitrarily. I think a reasonable person could combine this technique with a 3 of 10 synchronous replication quorum to get both safe persistence of data and reasonable performance. I can also envision use cases where this would not be desirable. Well, keep in mind all updates have to be done on the single master. That works pretty well for fine-grained replication, but I don't think it's very good for full-cluster replication. I'm completely failing to understand your point here. Could you restate another way? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] standby registration (was: is sync rep stalled?)
I think they work together fine. Greg's idea is that you list the important standbys, and a synchronization guarantee that you'd like to have for at least one of them. Simon's idea - at least at 10,000 feet - is that you can take a pass on that guarantee for transactions that don't need it. I don't see why you can't have both. So, two things: 1) This version of Standby Registration seems to add One More Damn Place You Need To Configure Standby (OMDPYNTCS) without adding any functionality you couldn't get *without* having a list on the master. Can someone explain to me what functionality is added by this approach vs. not having a list on the master at all? 2) I see Simon's approach where you can designate not just synch/asynch, but synch *mode* per session to be valuable. I can imagine having transactions I just want to ack vs. transactions I want to apply according to application logic (e.g. customer personal information vs. financial transactions). This approach would still seem to remove that functionality. Does it? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On 10/07/2010 03:19 PM, Dimitri Fontaine wrote: I think you're all into durability, and that's good. The extra cost is service downtime It's just *reduced* availability. That doesn't necessarily mean downtime, if you combine cleverly with async replication. if that's not what you're after: there's also availability and load balancing read queries on a system with no lag (no stale data servicing) when all is working right. All I'm saying is that those use cases are much better served with async replication. Maybe together with something that warns and takes action in case the standby's lag gets too big. Or what kind of customers do you think really need a no-lag solution for read-only queries? In the LAN case, the lag of async rep is negligible and in the WAN case the latencies of sync rep are prohibitive. My proposal is to make the risk window obvious and the behavior when you enter it configurable. I don't buy that. The risk calculation gets a lot simpler and obvious with strict guarantees. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On 10/07/2010 07:44 PM, Aidan Van Dyk wrote: The only case I see a race to quorum type of k N being useful is if you're just trying to duplicate data everywhere, but not actually querying any of the replicas. I can see that all queries go to the master, but the chances are pretty high the multiple machines are going to fail so I want multiple replicas being useful, but I *don't* think that's what most people are wanting in their I want 3 of 10 servers to ack the commit. What else do you think they want it for, if not for protection against data loss? (Note that the queries don't need to go to the master exclusively if you can live with some lag - and I think the vast majority of people can. The zero data loss guarantee holds true in any case, though). The difference between good async and sync is only the *guarentee*. If you don't need the guarantee, you don't need the synchronous part. Here we are exactly on the same page again. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] a few small bugs in plpgsql
Hello, today I found a few bugs: a) parser allow a labels on invalid positions with strange runtime bug: postgres=# CREATE OR REPLACE FUNCTION foo() RETURNS void AS $$ BEGIN FOR i IN 1..2 invalidLabel LOOP RAISE NOTICE '%',i; END LOOP; END; $$ LANGUAGE plpgsql; CREATE FUNCTION ERROR: column invalidlabel does not exist LINE 2: invalidLabel ^ QUERY: SELECT 2 invalidLabel CONTEXT: PL/pgSQL function foo line 3 at FOR with integer loop variable postgres=# b) SRF functions must not be finished by RETURN statement - I know, so there is outer default block, but it looks like inconsistency for SRF functions, because you can use a RETURN NEXT without RETURN. It maybe isn't bug - but I am filling it as inconsistency. postgres=# CREATE OR REPLACE FUNCTION fg(OUT i int) RETURNS SETOF int AS $$ BEGIN FOR i IN 1..3 LOOP fg.i := i; RETURN NEXT; END LOOP; END; $$ LANGUAGE plpgsql; CREATE FUNCTION postgres=# select fg(); fg 1 2 3 (3 rows) Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] standby registration (was: is sync rep stalled?)
On Thu, Oct 7, 2010 at 2:33 PM, Josh Berkus j...@agliodbs.com wrote: I think they work together fine. Greg's idea is that you list the important standbys, and a synchronization guarantee that you'd like to have for at least one of them. Simon's idea - at least at 10,000 feet - is that you can take a pass on that guarantee for transactions that don't need it. I don't see why you can't have both. So, two things: 1) This version of Standby Registration seems to add One More Damn Place You Need To Configure Standby (OMDPYNTCS) without adding any functionality you couldn't get *without* having a list on the master. Can someone explain to me what functionality is added by this approach vs. not having a list on the master at all? Well, then you couldn't have one strictly synchronous standby and one asynchronous standby. 2) I see Simon's approach where you can designate not just synch/asynch, but synch *mode* per session to be valuable. I can imagine having transactions I just want to ack vs. transactions I want to apply according to application logic (e.g. customer personal information vs. financial transactions). This approach would still seem to remove that functionality. Does it? I'm not totally sure. I think we could probably avoid removing that with careful detailed design. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On Thu, Oct 7, 2010 at 2:31 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: With web applications, at least, you often don't care that the data read is absolutely up-to-date, as long as the point in time doesn't jump around from one request to the next. When we have used load balancing between multiple database servers (which has actually become unnecessary for us lately because PostgreSQL has gotten so darned fast!), we have established affinity between a session and one of the database servers, so that if they became slightly out of sync, data would not pop in and out of existence arbitrarily. I think a reasonable person could combine this technique with a 3 of 10 synchronous replication quorum to get both safe persistence of data and reasonable performance. I can also envision use cases where this would not be desirable. Well, keep in mind all updates have to be done on the single master. That works pretty well for fine-grained replication, but I don't think it's very good for full-cluster replication. I'm completely failing to understand your point here. Could you restate another way? Establishing an affinity between a session and one of the database servers will only help if the traffic is strictly read-only. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
Markus Wanner mar...@bluegap.ch writes: I don't buy that. The risk calculation gets a lot simpler and obvious with strict guarantees. Ok, I'm lost in the use cases and analysis. I still don't understand why you want to consider the system already synchronous when it's not, whatever is the guarantee you're asking for. All I'm saying is that we should be able to know and show what the current system is up to, and we should be able to offer sane reactions in case of errors. You're calling a sane reaction blocking the master entirely when the standby ain't ready yet (it's still at the base backup state), and I can live with that. As an option. I say that either we go the lax quorum route, or we have to care for details and summary the failure cases with precision, and the possible responses with care. I don't see that possible without a clear state of each element in the system, their transitions, and a way to derive the global state of the distributed system out of that. It might be that the simpler way to go here is what Greg Smith has been proposing for a long time already, and again quite recently on this thread: have all the information you need in a system table and offer to run a user defined function to determine the state of the system. I think we managed to show what Josh Berkus wanted to know now. That's a quagmire here. Now, the problem I have is not Quorum Commit but the very definition of synchronous replication and the system we're trying to build. Not sure there's two of us wanting the same thing here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
Robert Haas robertmh...@gmail.com wrote: Establishing an affinity between a session and one of the database servers will only help if the traffic is strictly read-only. Thanks; I now see your point. In our environment, that's pretty common. Our most heavily used web app (the one for which we have, at times, needed load balancing) connects to the database with a read-only login. Many of our web apps do their writing by posting to queues which are handled at the appropriate source database later. (I had the opportunity to use one of these for real last night, to fill in a juror questionnaire after receiving a summons from the jury clerk in the county where I live.) Like I said, there are sane cases for this usage, but it won't fit everybody. I have no idea on percentages. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Git cvsserver serious issue
On 10/07/2010 10:11 AM, Magnus Hagander wrote: OTOH, this patch seems pretty small and simple to maintain. True, it is rather small. Does anybody know if there's an automated way to maintain that on freebsd ports, and if so, how that works? I want to be *sure* we can't accidentally upgrade git-cvsserver *without* the patch, since that is a security issue. Why not just make a local copy somewhere else and patch and run that? It's just a Perl script, no? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
Firstly I want to say I think this discussion is over-looking some benefits of the current system in other use cases. I don't think we should get rid of the current system even once we have proper partitioning. It solves use cases such as data warehouse queries that need to do a full table scan of some subset of the data which happens to be located in a single sub-table quite well. In that case being able to do a sequential scan instead of an index range scan is a big benefit and the overhead of the analysis is irrelevant for a data warehouse query. And the constraint may or may not have anything to do with the partitioning key. You cold have constraints like customer_id in (...) for last month's financial records so lookups for new customers don't need to check all the historical tables from before they became customers. In fact what I'm interested in doing is extending the support to use stats on children marked read-only. If we have a histogram for a table which has been marked read-only since the table was analyzed then we could trust the upper and lower bounds or the most-frequent-list to exclude partitions. That would really help for things like date-range lookups on tables where the partition key is financial quarter or invoice_id or some other nearly perfectly correlated column. None of this replaces having a good partitioning story for OLTP queries and management needs. But it extends the usefulness of that setup to data warehouse queries on other related columns that haven't been explicitly declared as the partitioning key. On Thu, Oct 7, 2010 at 8:35 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Vincenzo Romano vincenzo.rom...@notorand.it wrote: 2010/10/7 Stephen Frost sfr...@snowman.net: Yes, that would be the problem. Proving something based on expressions is alot more time consuming and complicated than being explicitly told what goes where. Consuming computing resources at DDL-time should be OK if that will lead to big savings at DML-time (run-time), my opinion. It'd be just like compile time optimizations. I think something you haven't entirely grasped is how pluggable PostgreSQL is -- you can not only define your own functions in a wide variety of languages (including C), but your own data types, operators, casts, index strategies, etc. I suspect it's likely that a partitioning system would only work with btree opclasses anyways. It might be interesting to think about what it would take to make the setups we've talked about in the past work with arbitrary operator classes as long as those operator classes support some concept of mutually exclusive. But nothing we've talked about so far would be that flexible. Pre-analyzing the check constraints to construct a partitioning data structure might even be a plausible way to move forward -- I don't see any obvious show-stoppers. The system could look for a set of btree opclass based conditions that guarantee all the partitions are mutually exclusive ranges. My instincts tell me it would be less useful though because there's less the system would be able to do with that structure to help the user. That is, if it *can't* prove the constraints are mutually exclusive then the user is left with a bunch of check constraints and no useful feedback about what they've done wrong. And if it can prove it the user is happy but the next time he has to add a partition he has to look at the existing partitions and carefully construct his check constraint instead of having the system help him out by supplying one side of the bounds and providing a convenient syntax. It would also be hard to specify how to automatically add partitions which I expect is a feature people will want eventually. There are some plus sides as well -- allowing some optimizations for check constraints without requiring the user to promise to always use that as their partitioning key in the future. But I think on the whole it would be a disadvantage. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Git cvsserver serious issue
On Thu, Oct 7, 2010 at 21:31, Andrew Dunstan and...@dunslane.net wrote: On 10/07/2010 10:11 AM, Magnus Hagander wrote: OTOH, this patch seems pretty small and simple to maintain. True, it is rather small. Does anybody know if there's an automated way to maintain that on freebsd ports, and if so, how that works? I want to be *sure* we can't accidentally upgrade git-cvsserver *without* the patch, since that is a security issue. Why not just make a local copy somewhere else and patch and run that? It's just a Perl script, no? Yeah, but then we have to remember to manually patch that one when somebody *else* finds/fixes a security issue. We have automatic monitoring on the ports stuff to detect when that happens.. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Timeout and Synch Rep
All, In my effort to make the discussion around the design decisions of synch rep less opaque, I'm starting a separate thread about what has developed to be one of the more contentious issues. I'm going to champion timeouts because I plan to use them. In fact, I plan to deploy synch rep with a timeout if it's available within 2 weeks of 9.1 being released. Without a timeout (i.e. wait forever is the only mode), that project will probably never use synch rep. Let me give you my use-case so that you can understand why I want a timeout. Client is a telecommunications service provider. They have a primary server and a failover server for data updates. They also have two async slaves on older machines for reporting purposes. The failover currently does NOT accept any queries in order to keep it as current as possible. They would like the failover to be synchronous so that they can guarentee no data loss in the event of a master failure. However, zero data loss is less important to them than uptime ... they have a five9's SLA with their clients, and the hardware on the master is very good. So, if something happens to the standby, and it cannot return an ack in 30 seconds, they would like it to degrade to asynch mode. At that point, they would also like to trigger a nagios alert which will wake up the sysadmin with flashing red lights. Once he has resolved the problem, he would like to promote the now-asynch standby back to synch standby. Yes, this means that, in the event of a standby failure, they have a window where any failure on the master will mean data loss. The user regards this risk as acceptable, given that both the master and the failover are located in the same data center in any case, so there is always a risk of a sufficient disaster wiping out all data back to the daily backup. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] standby registration (was: is sync rep stalled?)
On Thu, Oct 7, 2010 at 10:27 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: The standby name is a GUC in the standby's configuration file: standby_name='bostonserver' Fwiw I was hoping it would be possible to set every machine up with an identical postgresql.conf file. That doesn't preclude this idea since you could start up your server with a script that sets the GUC on the command-line and that script could use whatever it wants to look up its name such as using its hardware info to look it up in a database. But just something to keep in mind. In particular I would want to be able to configure everything identically and then have each node run some kind of program which determines its name and position in the replication structure. This implies that each node given its identity and the total view of the structure can figure out what it should be doing including whether to be read-only or read-write, who to contact as its master, and whether to listen from slaves. If every node needs a configuration file specifying multiple interdependent variables which are all different from server to server it'll be too hard to keep them all in sync. I would rather tell every node, here's how to push to the archive, here's how to pull, here's the whole master-slave structure even the parts you don't need to know about and the redundant entry for yourself -- now here's your name go figure out whether to push or pull and from where -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On Scalability
On Thu, 2010-10-07 at 14:10 +0200, Vincenzo Romano wrote: Making these things sub-linear (whether not O(log n) or even O(1) ), provided that there's way to, would make this RDBMS more appealing to enterprises. I mean also partial indexes (as an alternative to table partitioning). Being able to effectively cope with a dozen child tables or so it's more like an amateur feature. If you really need partitioning (or just hierarchical stuff) I think you'll need for quite more than a dozen items. If you partition by just weeks, you'll need 50+ a year. Is there any precise direction to where look into the code for it? Is there a way to put this into a wish list? It's already on the wish list (TODO) and has been for many years. We've mostly lacked somebody with the experience and time/funding to complete that implementation work. I figure I'll be doing it for 9.2 now; it may be difficult to do this for next release. Theoretically, this can be O(n.log n) for range partitioning and O(1) for exact value partitioning, though the latter isn't a frequent use case. Your conclusion that the current partitioning only works with a dozen or so items doesn't match the experience of current users however. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On Thu, 2010-10-07 at 13:44 -0400, Aidan Van Dyk wrote: To get non-stale responses, you can only query those k=3 servers. But you've shot your self in the foot because you don't know which 3/10 those will be. The other 7 *are* stale (by definition). They talk about picking the caught up slave when the master fails, but you actually need to do that for *every query*. There is a big confusion around that point and I need to point out that statement isn't accurate. It's taken me a long while to understand this. Asking for k 1 does *not* mean those servers are time synchronised. All it means is that the master will stop waiting after 3 acknowledgements. There is no connection between the master receiving acknowledgements and the standby applying changes received from master; the standbys are all independent of one another. In a bad case, those 3 acknowledgements might happen say 5 seconds apart on the worst and best of the 3 servers. So the first standby to receive the data could have applied the changes ~4.8 seconds prior to the 3rd standby. There is still a chance of reading stale data on one standby, but reading fresh data on another server. In most cases the time window is small, but still exists. The other 7 are stale with respect to the first 3. But then so are the last 9 compared with the first one. The value of k has nothing whatsoever to do with the time difference between the master and the last standby to receive/apply the changes. The gap between first and last standby (i.e. N, not k) is the time window during which a query might/might not see a particular committed result. So standbys are eventually consistent whether or not the master relies on them to provide an acknowledgement. The only place where you can guarantee non-stale data is on the master. High values of k reduce the possibility of data loss, whereas expected cluster availability is reduced as N - k gets smaller. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with Quorum Commit
On Thu, 2010-10-07 at 19:50 +0200, Markus Wanner wrote: So far I've been under the impression that Simon already has the code for quorum_commit k = 1. I do, but its not a parameter. The k = 1 behaviour is hardcoded and considerably simplifies the design. Moving to k 1 is additional work, slows things down and seems likely to be fragile. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers