Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches
Tom Lane wrote: The idea of input functions that alter system tables scares me. An example: SELECT 'system_u:object_r:sepgsql_table_t:SystemHigh'::security_label; can insert a new tuple into pg_security, but it is not a desirable behavior. To fix this, I'll remove security_label type and define security_context system column as a text type column, and translate the contents into an internal identifical representation just before update or insert a tuple. It enables to avoid to insert a temporary used (unnecessary) security context, and enables to use various kind of text functions. Please comment it, if you have anything. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. The current design of pg_standby is utterly incapable of handling that requirement. So there might be an implementation dependency there, depending on how we want to solve that problem. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. There could be multiple slaves following a master, some serving data-warehousing queries, some for load-balancing reads, some others just for disaster recovery, and then some just to mitigate human errors by re-applying the logs with a delay. I don't think any one installation would see all of the above mentioned scenarios, but we need to take care of multiple slaves operating off of a single master; something similar to cascaded Slony-I. My two cents. Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device
Re: [HACKERS] Avoiding second heap scan in VACUUM
On Fri, May 30, 2008 at 2:41 PM, Simon Riggs [EMAIL PROTECTED] wrote: What I still don't accept is that an unconstrained wait is justifiable. You've just said its a minor detail, but that's not the way I see it. It might be a second, but it might be an hour or more. I am suggesting a timed wait. May be say between 60-300 seconds. That's the maximum VACUUM would get delayed. If exiting transactions don't finish within that time, VACUUM just works as it does today. So it can't certainly be much worse than what it is today. A non-waiting solution seems like the only way to proceed. Yeah, but we don't have a simple solution yet which would work in all cases and is not too complicated. Is this a non-issue anyway, with DSM? I thought about that. DSM would certainly reduce the cost of heap scans. But still the second pass would be required and it would re-dirty all the pages again, Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal - Collation at database level
Zdenek Kotala wrote: Radek Strnad napsal(a): snip I'm thinking of dividing the problem into two parts - in beginning pg_collation will contain two functions. One will have hard-coded rules for these basic collations (SQL_CHARACTER, GRAPHIC_IRV, LATIN1, ISO8BIT, UCS_BASIC). It will compare each string character bitwise and guarantee that the implementation will meet the SQL standard implemented in PostgreSQL. Second one will allow the user to use installed system locales. The set of these collations will obviously vary between systems. Catalogs will contain encoding and collation for calling the system locale function. This will allow us to use collations such as en_US.utf8, cs_CZ.iso88592 etc. if they will be availible. We will also need to change the way how strings are compared. Regarding the set database collation the right function will be used. http://doxygen.postgresql.org/varlena_8c.html#4c7af81f110f9be0bd8eb2bd99525675 This design will make possible switch to ICU or any other implementation quite simple and will not cause any major rewriting of what I'm coding right now. Collation function is main point here. How you mentioned one will be only wrapper about strcmp and second one about strcoll. (maybe you need four - char/wchar) Which function will be used it is defined in pg_collation catalog by CREATE COLLATION command. But you need specify name of locale for system locales. It means you need attribute for storing locale name. You're right. I've extended pg_collation for system locale columns. In the first stage we actually don't need any other catalogs such as encoding, etc. and we can build this functionality only on following pg_collation catalog. Used collation function (system or built-in) will be decided on existing collation name. CATALOG(pg_collations, ###) { NameDatacolname;/* collation name */ Oidcolschema;/* collation schema */ NameData colcharset;/* character set specification */ Oid colexistingcollation; /* existing collation */ boolcolpadattribute;/* pad attribute */ boolcolcasesensitive;/* case sensitive */ boolcolaccent;/* accent sensitive */ NameDatacolsyslccollate;/* lc_collate */ NameDatacolsyslcctype; /* lc_ctype */ regproccolfunc;/* used collation function */ } FormData_pg_collations; FormData_pg_collations; It would be good to send list of new and modified SQL commands (like CREATE COLLATION) for wide discussion. CREATE COLLATION collation name FOR character set specification FROM existing collation name [ pad characteristic ] [ case sensitive ] [ accent sensitive ] [ LC_COLLATE lc_collate ] [ LC_CTYPE lc_ctype ] pad characteristic := NO PAD | PAD SPACE case sensitive := CASE SENSITIVE | CASE INSENSITIVE accent sensitive := ACCENT SENSITIVE | ACCENT INSENSITIVE Since you can specify order by in select clause there's no need for adding ascending and descending type of collation. They will allways be ascending. DROP COLLATION collation name CREATE DATABASE ... [ COLLATE collation name ] ... ALTER DATABASE ... [ COLLATE collation name ] ... Any thoughts? Radek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding second heap scan in VACUUM
On Fri, May 30, 2008 at 1:56 AM, Simon Riggs [EMAIL PROTECTED] wrote: Been thinking some more about this. You're right that the second scan could re-dirty many pages and is probably something to avoid. Right. IMHO it would help us a lot. The main issue I see is that you don't really know how much work will happen in the first phase and how much would happen in the second. With HOT, I see very little work left for the second pass. The dead space is already collected in the first pass. The second pass only cleans up the DEAD line pointers. Also if we can update FSM early (at the end of first pass), we can avoid further irreverssible bloat of the heap. no problem at all. I'd rather keep it as it is than have sometimes better, sometimes worse behaviour. For large tables, two heap scans along with several additional page writes doesn't seem to the cost we can afford, especially in IO-bound application. IMHO a timed wait is not such a bad thing. Note that its all about VACUUM which is a background, maintenance activity and it won't harm to delay it by few seconds or even minutes. Also, as I said earlier waiting is a minor detail, may be there is a better way to do things. Unless there are some strong objections, I would like to give it a shot and see if there are any real benefits. We can then fix any regression cases. Let me know if somebody thinks there are certain show stoppers or the benefits of avoiding a second scan on a large table is not worth. I personally have a strong feeling that it's worth the efforts. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, May 29, 2008 at 01:58:34PM -0700, David Fetter wrote: If people on core had come to the idea that we needed to build in replication *before* 8.3 came out, they certainly didn't announce it. Now is a great time to mention this because it gives everybody time to: 1. Come to a consensus on what the out-of-the-box replication should be, and 2. Build, test and debug whatever the consensus out-of-the-box replication turns out to be. None of that is an argument for why this has to go in 8.4. I argued in Ottawa that the idea that you have to plan a feature for _the next release_ is getting less tenable with each release. This is because major new features for Postgres are now often big and complicated. The days of big gains from single victories are mostly over (though there are exceptions, like HOT). Postgres is already mature. As for the middle-aged person with a mortgage, longer-term planning is simply a necessary part of life now. There are two possibilities here. One is to have huge releases on much longer timetables. I think this is unsustainable in a free project, because people will get bored and go away if they don't get to use the results of their work in a reasonably short time frame. The other is to accept that sometimes, planning and development for new features will have to start a long time before actual release -- maybe planning and some coding for 2 releases out. That allows large features like the one we're discussing to be developed responsibly without making everything else wait for it. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On 5/30/08, Gurjeet Singh [EMAIL PROTECTED] wrote: On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. The current design of pg_standby is utterly incapable of handling that requirement. So there might be an implementation dependency there, depending on how we want to solve that problem. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. I disagree - it's better to consider syncronized WAL-slaves as equal to master, so having queries there affect master is ok. You need to remeber this solution tries not to replace 100-node Slony-I setups. You can run sanity checks on slaves or use them to load-balance read-only OLTP queries, but not random stuff. There could be multiple slaves following a master, some serving data-warehousing queries, some for load-balancing reads, some others just for disaster recovery, and then some just to mitigate human errors by re-applying the logs with a delay. To run warehousing queries you better use Slony-I / Londiste. For warehousring you want different / more indexes on tables anyway, so I think it's quite ok to say don't do it for complex queries on WAL-slaves. I don't think any one installation would see all of the above mentioned scenarios, but we need to take care of multiple slaves operating off of a single master; something similar to cascaded Slony-I. Again, the synchronized WAL replication is not generic solution. Use Slony/Londiste if you want to get totally independent slaves. Thankfully the -core has set concrete and limited goals, that means it is possible to see working code in reasonable time. I think that should apply to read-only slaves too. If we try to make it handle any load, it will not be finished in any time. Now if we limit the scope I've seen 2 variants thus far: 1) Keep slave max in sync, let the load there affect master (xmin). - Slave can be used to load-balance OLTP load - Slave should not be used for complex queries. 2) If long query is running, let slave lag (avoid applying WAL data). - Slave cannot be used to load-balance OLTP load - Slave can be used for complex queries (although no new indexes or temp tables can be created). I think 1) is more important (and more easily implementable) case. For 2) we already have solutions (Slony/Londiste/Bucardo, etc) so there is no point to make effort to solve this here. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Sugestion: xpath
I think it is good idea make a function returning set of records, like xpath_table, but accepting xml data in a text paramiter. Today if you have a function with a xml parameter, you have to save it in a table to use xpath. Thanks,
Re: [HACKERS] Avoiding second heap scan in VACUUM
On Fri, May 30, 2008 at 3:31 PM, Simon Riggs [EMAIL PROTECTED] wrote: Perhaps we can start first scan, check xid after we scan each few blocks. Once we find the xid is older, then we know the size of the second scan can be limited to only those blocks already scanned. So the two endpoints of behaviour are we skip the scan completely or we do the whole scan, but at least there is a saving in many cases without waiting. Hmm. Interesting. I was about to suggest that we use some heuristic such as size of the table to decide whether or not try the optimization. But what you just suggested makes more sense. So instead of waiting, we anyways start the first scan. If we are lucky, in some time all the old transactions would go away and then we can start marking the DEAD line pointers as DEAD_RECLAIMED. The second pass just needs to rescan the initial blocks to remove the DEAD line pointers. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hint Bits and Write I/O
Kevin Grittner wrote: On Wed, May 28, 2008 at 6:26 PM, in message [EMAIL PROTECTED], Florian G. Pflug [EMAIL PROTECTED] wrote: I think we should put some randomness into the decision, to spread the IO caused by hit-bit updates after a batch load. Currently we have a policy of doing a VACUUM FREEZE ANALYZE on a table after a bulk load, or on the entire database after loading a pg_dump of a database. We do this before putting the table or database into production. This avoids surprising clusters of writes at unpredictable times. Please don't defeat that. (I'm not sure whether your current suggestion would.) No, VACUUM (and therefore VACUUM FREEZE) dirty all buffers they set hit bits on anyway, since they also update the xmin values. But a more IO-friendly approach to setting hit bits might make that VACUUM FREEZE step unnecessary ;-) regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote: On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. We ruled that out as the-only-solution a while back. It does have the beauty of simplicity, so it may exist as an option or possibly the only way, for 8.4. Yeah. The point is that it's fairly clear that we could make that work. A solution that doesn't impact the master at all would be nicer, but it's not at all clear to me that one is possible, unless we abandon WAL-shipping as the base technology. Quite. Before we start ruling things out let's know what we think we can actually do. I hope that NTT will release their code ASAP so we will have a better idea of what we have and what we need. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sugestion: xpath
[EMAIL PROTECTED] writes: I think it is good idea make a function returning set of records, like xpath_table, but accepting xml data in a text paramiter. Today if you have a function with a xml parameter, you have to save it in a table to use xpath. I believe you're mistaken. Why would you need to put the value into a table? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Change lock requirements for adding a trigger
On Thu, 2008-05-29 at 19:18 -0500, Decibel! wrote: Is there a reason that we can't add a trigger to a table while a select is running? This is a serious pain when trying to setup londiste or slony. This is constrained by locking. There are a subset of DDL commands that might be able to be performed with just an ExclusiveLock or ShareLock rather than an AccessExclusiveLock. Nobody has studied which sub-statements this might apply to, but its do-able since CREATE INDEX already does this. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote: On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. We ruled that out as the-only-solution a while back. It does have the beauty of simplicity, so it may exist as an option or possibly the only way, for 8.4. Yeah. The point is that it's fairly clear that we could make that work. A solution that doesn't impact the master at all would be nicer, but it's not at all clear to me that one is possible, unless we abandon WAL-shipping as the base technology. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hi Tom, Thanks for the reasoned reply. As you saw from point #2 in my comments, I think you should do this feature. I hope this answers Josh Berkus' concern about my comments. You make a very interesting comment which seems to go to the heart of this design approach: About the only thing that would make me want to consider row-based replication in core would be if we determine that read-only slave queries are impractical atop a WAL-log-shipping implementation. It's possible I'm misunderstanding some of the implementation issues, but it is striking that the detailed responses to your proposal list a number of low-level dependencies between master and slave states when replicating WAL records. It appears that you are designing a replication mechanism that works effectively between a master and a relatively small number of nearby slaves. This is clearly an important use case but it also seems clear that the WAL approach is not a general-purpose approach to replication. In other words, you'll incrementally get to that limited end point I describe. This will still leave a lot to be desired on read scaling, not to mention many other cases. Hence my original comments. However, rather than harp on that further I will open up a separate thread to describe a relatively small set of extensions to PostgreSQL that would be enabling for a wide range of replication applications. Contrary to popular opinion these extensions are actually well understood at the theory level and have been implemented as prototypes as well as in commercial patches multiple times in different databases. Those of us who are deeply involved in replication deserve just condemnation for not stepping up and getting our thoughts out on the table. Meanwhile, I would be interested in your reaction to these thoughts on the scope of the real-time WAL approach. There's obviously tremendous interest in this feature. A general description that goes beyond the NTT slides would be most helpful for further discussions. Cheers, Robert P.s., The NTT slides were really great. Takahiro and Masao deserve congratulations on an absolutely first-rate presentation. On 5/29/08 9:09 PM, Tom Lane [EMAIL PROTECTED] wrote: Andrew Sullivan [EMAIL PROTECTED] writes: On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote: people are starting to get religion on this issue I would strongly advocate a parallel effort to put in a change-set extraction API that would allow construction of comprehensive master/slave replication. You know, I gave a talk in Ottawa just last week about how the last effort to develop a comprehensive API for replication failed. Indeed, core's change of heart on this issue was largely driven by Andrew's talk and subsequent discussion. We had more or less been waiting for the various external replication projects to tell us what they wanted in this line, and it was only the realization that no such thing was likely to happen that forced us to think seriously about what could be done within the core project. As I said originally, we have no expectation that the proposed features will displace the existing replication projects for high end replication problems ... and I'd characterize all of Robert's concerns as high end problems. We are happy to let those be solved outside the core project. About the only thing that would make me want to consider row-based replication in core would be if we determine that read-only slave queries are impractical atop a WAL-log-shipping implementation. Which could happen; in fact I think that's the main risk of the proposed development plan. But I also think that the near-term steps of the plan are worth doing anyway, for various other reasons, and so we won't be out too much effort if the plan fails. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, May 30, 2008 at 9:31 AM, Marko Kreen [EMAIL PROTECTED] wrote: On 5/30/08, Gurjeet Singh [EMAIL PROTECTED] wrote: I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. I disagree - it's better to consider syncronized WAL-slaves as equal to master, so having queries there affect master is ok. You need to remeber this solution tries not to replace 100-node Slony-I setups. You can run sanity checks on slaves or use them to load-balance read-only OLTP queries, but not random stuff. There could be multiple slaves following a master, some serving data-warehousing queries, some for load-balancing reads, some others just for disaster recovery, and then some just to mitigate human errors by re-applying the logs with a delay. To run warehousing queries you better use Slony-I / Londiste. For warehousring you want different / more indexes on tables anyway, so I think it's quite ok to say don't do it for complex queries on WAL-slaves. I don't think any one installation would see all of the above mentioned scenarios, but we need to take care of multiple slaves operating off of a single master; something similar to cascaded Slony-I. Again, the synchronized WAL replication is not generic solution. Use Slony/Londiste if you want to get totally independent slaves. I strongly agree with Gurjeet. The warm standby replication mechanism is pretty simple and is wonderfully flexible with the one big requirement that your clusters have to be mirrors of each other. Synchronous wal replication obviously needs some communication channel from the slave back to the master. Hopefully, it will be possible to avoid this for asynchronous shipping. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 2008-05-30 at 11:30 -0400, Andrew Dunstan wrote: Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote: On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. We ruled that out as the-only-solution a while back. It does have the beauty of simplicity, so it may exist as an option or possibly the only way, for 8.4. Yeah. The point is that it's fairly clear that we could make that work. A solution that doesn't impact the master at all would be nicer, but it's not at all clear to me that one is possible, unless we abandon WAL-shipping as the base technology. Quite. Before we start ruling things out let's know what we think we can actually do. Let me re-phrase: I'm aware of that possibility and believe we can and could do it for 8.4. My assessment is that people won't find it sufficient and I am looking at other alternatives also. There may be a better one possible for 8.4, there may not. Hence I've said something in 8.4, something better later. There is no need to decide that is the only way forward, yet. I hope and expect to put some of these ideas into a more concrete form, but this has not yet happened. Nothing has slipped, not having any trouble getting on with it, just that my plans were to not start it yet. I think having a detailed design ready for review by September commit fest is credible. I hope that NTT will release their code ASAP so we will have a better idea of what we have and what we need. That has very little to do with Hot Standby, though there could be patch conflicts, which is why I'm aiming to get WAL streaming done first. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote: On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. The current design of pg_standby is utterly incapable of handling that requirement. So there might be an implementation dependency there, depending on how we want to solve that problem. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. There could be multiple slaves following a master, some serving data-warehousing queries, some for load-balancing reads, some others just for disaster recovery, and then some just to mitigate human errors by re-applying the logs with a delay. Agreed. We ruled that out as the-only-solution a while back. It does have the beauty of simplicity, so it may exist as an option or possibly the only way, for 8.4. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 29 May 2008 09:22:26 -0700 Steve Atkins wrote: On May 29, 2008, at 9:12 AM, David Fetter wrote: Either one of these would be great, but something that involves machines that stay useless most of the time is just not going to work. I have customers who are thinking about warm standby functionality, and the only thing stopping them deploying it is complexity and maintenance, not the cost of the HA hardware. If trivial-to-deploy replication that didn't offer read-only access of the slaves were available today I'd bet that most of them would be using it. Sure, have a similar customer. They are right now using a set of Perl-scripts which ship the logfiles to the slave, take care of the status, apply the logfiles, validate checksums ect ect. The whole thing works very well in combination with RedHat cluster software, but it took several weeks to implement the current solution. Not everyone wants to spend the time and the manpower to implement a simple replication. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 29 May 2008 18:29:01 -0400 Tom Lane wrote: Dimitri Fontaine [EMAIL PROTECTED] writes: While at it, would it be possible for the simple part of the core team statement to include automatic failover? No, I think it would be a useless expenditure of energy. Failover includes a lot of things that are not within our purview: switching IP addresses to point to the new server, some kind of STONITH solution to keep the original master from coming back to life, etc. Moreover there are already projects/products concerned with those issues. True words. Failover is not and should not be part of PostgreSQL. But PG can help the failover solution, as example: an easy-to-use interface about the current slave status comes into my mind. Other ideas might also be possible. It might be useful to document where to find solutions to that problem, but we can't take it on as part of core Postgres. Ack Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication hooks
Hi Marko, Replication requirements vary widely of course, but DDL support is shared by such a wide range of use cases it is very difficult to see how any real solution would fail to include it. This extends to change extraction APIs, however, defined. The question of what DDL to replicate is also quite clear-all of it with as few exceptions as possible. For instance, it is almost impossible to set up and manage replicated systems easily if you cannot propagate schema changes in serialized order along with other updates from applications. The inconvenience of using alternative mechanisms like the SLONY 'execute script' is considerable and breaks most commonly used database management tools. That said, SLONY at least serializes the changes. Non-serialized approaches lead to serious outages and can get you into distributed consensus problems, such as when is it 'safe' to change schema across different instances. These are very hard to solve practically and tend to run into known impossibility results like Brewer's Conjecture, which holds that it is impossible to keep distributed databases consistent while also remaining open for updates and handling network partitions. I'll post back later on the question of the API. The key is to do something simple that avoids the problems discussed by Andrew and ties it accurately to use cases. However, this requires a more prepared response than my hastily written post from last night. Cheers, Robert On 5/29/08 9:05 PM, Marko Kreen [EMAIL PROTECTED] wrote: On 5/29/08, Andrew Sullivan [EMAIL PROTECTED] wrote: On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote: people are starting to get religion on this issue I would strongly advocate a parallel effort to put in a change-set extraction API that would allow construction of comprehensive master/slave replication. You know, I gave a talk in Ottawa just last week about how the last effort to develop a comprehensive API for replication failed. I had some ideas about why, the main one of which is something like this: Big features with a roadmap have not historically worked, so unless we're willing to change the way we work, we won't get that. I don't think an API is what's needed. It's clear proposals for particlar features that can be delivered in small pieces. That's what the current proposal offers. I think any kind of row-based approach such as what you're proposing would need that kind of proposal too. That isn't to say that I think an API is impossible or undesirable. It is to say that the last few times we tried, it went nowhere; and that I don't think the circumstances have changed. I think the issue is simpler - API for synchronous replication is undesirable - it would be too complex and hinder future development (as I explained above). And the API for asynchronous replication is already there - triggers, txid functions for queueing. There is this tiny matter of replicating schema changes asynchronously, but I suspect nobody actually cares. Few random points about that: - The task cannot even be clearly defined (on technical level - how the events should be represented). - Any schema changes need to be carefully prepared anyway. Whether to apply them to one or more servers does not make much difference. - Major plus of async replica is ability to actually have different schema on slaves. - People _do_ care about exact schema on single place - failover servers. - But for failover server we want also synchronous replication. So if we have synchronous WAL based replication for failover servers, the interest in hooks to log schema changes will decrease even more. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Initial max_connections for initdb on FreeBSD.
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 I don't think so. This is a particular case. So my advice is to hack test_config_settings() and add your custom values to trial_conns[] and trial_bufs[]. Thanks. I dropped those to their bare minimums (1 and 16), and still did not have enough memory (!) It also appears that users cannot manipulate their shmmax setting inside of a FreeBSD jail, so I'll just give up this part of the quest. (Shane, the PGOPTIONS stuff does not apply to this case) - -- Greg Sabino Mullane [EMAIL PROTECTED] PGP Key: 0x14964AC8 200805301018 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkhADL4ACgkQvJuQZxSWSsj/6QCgqzS3EgTgM0QB+kYsdlHycimp FuAAn22suYbUfGzDSgPA9M33W12pvuNY =JxDi -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote: Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. Not if you use pg_clearxlogtail ( http://www.2ndquadrant.com/replication.htm ), which got lost in the giant March commitfest queue but should probably wander into contrib as part of 8.4. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, May 29, 2008 at 7:42 PM, Tom Lane [EMAIL PROTECTED] wrote: The big problem is that long-running slave-side queries might still need tuples that are vacuumable on the master, and so replication of vacuuming actions would cause the slave's queries to deliver wrong answers. Another issue with read-only slaves just popped up in my head. How do we block the readers on the slave while it is replaying an ALTER TABLE or similar command that requires Exclusive lock and potentially alters the table's structure. Or does the WAL replay already takes an x-lock on such a table? Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 2008-05-30 at 11:12 -0700, Robert Hodges wrote: This is clearly an important use case but it also seems clear that the WAL approach is not a general-purpose approach to replication. I think we cannot make such a statement yet, if ever. I would note that log-based replication is now the mainstay of commercial database replication techniques for loosely-coupled groups of servers. It would seem strange to assume that it should not be good for us too, simply because we know it to be difficult. IMHO the project has a pretty good track record of delivering functionality that looked hard at first glance. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Shouldn't Natural JOINs Follow FK Constraints?
Howdy, I just saw this in the docs: Finally, NATURAL is a shorthand form of USING: it forms a USING list consisting of exactly those column names that appear in both input tables. As with USING, these columns appear only once in the output table. That sounds useful if I happen to have named my columns exactly the same between the two tables, but couldn't a NATURAL JOIN follow the FK constraints, instead? That would be so much more useful and much less magical, I should think. Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thursday 29 May 2008 22:59:21 Merlin Moncure wrote: On Thu, May 29, 2008 at 9:26 PM, Josh Berkus [EMAIL PROTECTED] wrote: I fully accept that it may be the case that it doesn't make technical sense to tackle them in any order besides sync-read-only slaves because of dependencies in the implementation between the two. If that's the case, it would be nice to explicitly spell out what that was to deflect criticism of the planned prioritization. There's a very simple reason to prioritize the synchronous log shipping first; NTT may open source their solution and we'll get it a lot sooner than the other components. That's a good argument. I just read the NTT document and the stuff looks fantastic. You've convinced me... It would be a better argument if the NTT guys hadn't said that they estimated 6 months time before the code would be released, which puts us beyond 8.4. Now it is possible that the time frame could be sooner, but unless someone already has the patch, this reminds me a little too much of the arguments for including windows support in a single release because we already had a work port/patch set to go from. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Andreas 'ads' Scherbaum wrote: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thursday 29 May 2008 20:31:31 Greg Smith wrote: On Thu, 29 May 2008, Tom Lane wrote: There's no point in having read-only slave queries if you don't have a trustworthy method of getting the data to them. This is a key statement that highlights the difference in how you're thinking about this compared to some other people here. As far as some are concerned, the already working log shipping *is* a trustworthy method of getting data to the read-only slaves. There are plenty of applications (web oriented ones in particular) where if you could direct read-only queries against a slave, the resulting combination would be a giant improvement over the status quo even if that slave was as much as archive_timeout behind the master. That quantity of lag is perfectly fine for a lot of the same apps that have read scalability issues. If you're someone who falls into that camp, the idea of putting the sync replication job before the read-only slave one seems really backwards. Just looking at it from an overall market perspective, synchronous log shipping pretty much only addresses failover needs, where as read-only slaves address both failover and scaling issues. (Note I say address, not solve). -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Friday 30 May 2008 01:10:20 Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: I fully accept that it may be the case that it doesn't make technical sense to tackle them in any order besides sync-read-only slaves because of dependencies in the implementation between the two. Well, it's certainly not been my intention to suggest that no one should start work on read-only-slaves before we finish the other part. The point is that I expect the log shipping issues will be done first because they're easier, and it would be pointless to not release that feature if we had it. But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. The current design of pg_standby is utterly incapable of handling that requirement. So there might be an implementation dependency there, depending on how we want to solve that problem. Sure, but whose to say that after synchronous wal shipping is finished it wont need a serious re-write due to new needs from the hot standby feature. I think going either way carries some risk. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] move hash_any from hash to utils
I'm trying to cleanup header dependences and it seems hash_any and hash_uint32 function is candidate to move into separate c file in backend/utils/misc directory. It significantly reduce amount of unnecessary includes from files like network.c ( http://doxygen.postgresql.org/network_8c.html) Any comments? Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shouldn't Natural JOINs Follow FK Constraints?
On Fri, 2008-05-30 at 13:19 -0700, David E. Wheeler wrote: Howdy, I just saw this in the docs: Finally, NATURAL is a shorthand form of USING: it forms a USING list consisting of exactly those column names that appear in both input tables. As with USING, these columns appear only once in the output table. That sounds useful if I happen to have named my columns exactly the same between the two tables, but couldn't a NATURAL JOIN follow the FK constraints, instead? That would be so much more useful and much less magical, I should think. A good point, but I'm sure the SQL standard has something to say about this, so I don't think we have much of a choice. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Shouldn't Natural JOINs Follow FK Constraints?
Howdy, I just saw this in the docs: Finally, NATURAL is a shorthand form of USING: it forms a USING list consisting of exactly those column names that appear in both input tables. As with USING, these columns appear only once in the output table. That sounds useful if I happen to have named my columns exactly the same between the two tables, but couldn't a NATURAL JOIN follow the FK constraints, instead? That would be so much more useful and much less magical, I should think. Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sat, May 31, 2008 at 1:52 AM, Greg Smith [EMAIL PROTECTED] wrote: On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote: Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. Not if you use pg_clearxlogtail ( http://www.2ndquadrant.com/replication.htm ), which got lost in the giant March commitfest queue but should probably wander into contrib as part of 8.4. This means we need to modify pg_standby to not check for filesize when reading XLogs. Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device
Re: [HACKERS] Core team statement on replication in PostgreSQL
Andreas 'ads' Scherbaum wrote: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. Sumitomo Electric Co., Ltd., a 20 billion dollars selling company in Japan (parent company of Sumitomo Electric Information Systems Co., Ltd., which is one of the Recursive SQL development support company) uses 100 PostgreSQL servers. They are doing backups by using log shipping to another data center and have problems with the amount of the transferring log data. They said this is one of the big problems they have with PostgreSQL and hope it will be solved in the near future. -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Overhauling GUCS
Magnus and I had the fortune to share a plane flight when leaving pgCon. This gave us a chance to discuss my ideas around reforming PostgreSQL configuration at some length. Here's what we came up with (Magnus will presumably correct the parts I got wrong): Currently, PostgreSQL,conf and our set of GUC configurations suffer from 4 large problems: 1. Most people have no idea how to set these. 2. The current postgresql.conf file is a huge mess of 194 options, the vast majority of which most users will never touch. 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf, and the settings.sgml), which are only synched with each other manually. 4. We don't seem to be getting any closer to autotuning. We think all of these are solvable for 8.4. We also think that it's vitally important than any overhaul of the GUCS to be completed in one version to minimize the pain involved. Here's a list of the things we want to change. It's all a package and should make sense if you take all the changes as a whole. 1) Add several pieces of extra information to guc.c in the form of extra gettext commands: default value, subcategory, long description, recommendations, enum lists. 2) Incorporate this data into pg_settings 3) Delete postgresql.conf.sample 4) Add a script called pg_generate_conf to generate a postgresql.conf based on guc.c and command-line switches (rather than postgresql.conf.sample), which would be called automatically by initdb. For (4), pg_generate_conf would take the following switches: -b , --basic = short conf file, listing only the 15-18 most commonly changed options -a , --advanced = conf file listing all 196+ options -t, --terse = conf file lists only category headings and actual settings, no comments -n, --normal = conf file has category and subcategory settings, with short desc comments -v, --verbose = conf file lists full descriptions and recommendations in comments with each option -c option = value set specific option to specific value in the file -f filename = take options and values from file filename The default would be -b, -n with specific settings for shared_buffers and wal_sync_method. The current postgresql.conf is a lot more like an -a, -v file. This would help us in the following ways: A. there would now only be 2 places to maintain GUCS lists, the docs and guc.c. B. by having a generated postgresql.conf and an easy way to generate it, writing autoconfiguration scripts (as well as shortcuts like SET PERSISTENT) become vastly easier. C. Most users would deal only with a limited set of 15-20 configuration variables. There's obviously some refinements needed, but what do people think of the above general idea? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sat, 2008-05-31 at 02:48 +0530, Gurjeet Singh wrote: Not if you use pg_clearxlogtail ( http://www.2ndquadrant.com/replication.htm ), which got lost in the giant March commitfest queue but should probably wander into contrib as part of 8.4. This means we need to modify pg_standby to not check for filesize when reading XLogs. Best regards, It does. Joshua D. Drake -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sat, 31 May 2008, Gurjeet Singh wrote: Not if you use pg_clearxlogtail This means we need to modify pg_standby to not check for filesize when reading XLogs. No, the idea is that you run the segments through pg_clearxlogtail | gzip, which then compresses lightly used segments massively because all the unused bytes are 0. File comes out the same size at the other side, but you didn't ship a full 16MB if there was only a few KB used. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 2008-05-30 at 01:10 -0400, Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: I fully accept that it may be the case that it doesn't make technical sense to tackle them in any order besides sync-read-only slaves because of dependencies in the implementation between the two. Well, it's certainly not been my intention to suggest that no one should start work on read-only-slaves before we finish the other part. The point is that I expect the log shipping issues will be done first because they're easier, and it would be pointless to not release that feature if we had it. Agreed. I'm arriving late to a thread that seems to have grown out of all proportion. AFAICS streaming WAL and hot standby are completely orthogonal features. Streaming WAL is easier and if NTT can release their code to open source we may get this in the Sept commit fest. Hot Standby is harder and it was my viewpoint at PGCon that we may not have a perfect working version of this by the end of 8.4. We are very likely to have something working, but maybe not the whole feature set as we might wish to have. I expect to be actively working on this soon. I definitely do want to see WAL streaming going in as early as possible and before end of 8.4, otherwise code conflicts and other difficulties are likely to push out the 8.4 date and/or Hot Standby. So as I see it, Tom has only passed on my comments on this, not added or removed anything. The main part of the announcement was really about bringing the WAL streaming into core and effectively favouring it over a range of other projects. Can we all back off a little on this for now? Various concerns have been validly expressed, but it will all come good AFAICS. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding second heap scan in VACUUM
On Fri, 2008-05-30 at 14:50 +0530, Pavan Deolasee wrote: On Fri, May 30, 2008 at 2:41 PM, Simon Riggs [EMAIL PROTECTED] wrote: What I still don't accept is that an unconstrained wait is justifiable. You've just said its a minor detail, but that's not the way I see it. It might be a second, but it might be an hour or more. I am suggesting a timed wait. May be say between 60-300 seconds. That's the maximum VACUUM would get delayed. If exiting transactions don't finish within that time, VACUUM just works as it does today. So it can't certainly be much worse than what it is today. A non-waiting solution seems like the only way to proceed. I understand what you're saying and agree that in (some) cases a small wait is not important. I'm just saying some != all, and the gap covers important cases: If we have a database with 100 tables (very common) and we add 5 minutes waiting time to each vacuum, then we'll make a complete database VACUUM take ~7 hours longer than it did before. 1000 tables would cause rioting in the streets. Waiting for 5 minutes for a 0.5 second vacuum isn't sensible either, whatever the gain. It's clear Amdahl's Law would not advise us to optimise that (in this way). So if its a large table and we submitted it with a non-zero vacuum wait, then maybe a wait is an acceptable optimisation. Perhaps we can start first scan, check xid after we scan each few blocks. Once we find the xid is older, then we know the size of the second scan can be limited to only those blocks already scanned. So the two endpoints of behaviour are we skip the scan completely or we do the whole scan, but at least there is a saving in many cases without waiting. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Le vendredi 30 mai 2008, Tom Lane a écrit : No, I think it would be a useless expenditure of energy. Failover includes a lot of things that are not within our purview: switching IP addresses to point to the new server, some kind of STONITH solution to keep the original master from coming back to life, etc. Moreover there are already projects/products concerned with those issues. Well, I forgot that there's in fact no active plan to put pgbouncer features into core at the moment (I'm sure I've read something about this on the lists though). If it was the case, the slave could proxy queries to the master, and stop proxying but serve them if the master tells it it's dying. This way, no need to switch IP addresses, the clients just connect as usual and get results back and do not have to know whether the host they're qerying against is a slave or a master. This level of smartness is into -core. The STONITH part in case of known failure (fatal) does not seem that hard either, as the master at fatal time could write somewhere it's now a slave and use this at next startup time (recovery.conf?). If it can't even do that, well, we're back to crash situation with no provided automatic failover solution. Not handled failure cases obviously will continue to exist. I'm not asking for all cases managed in -core please, just for some level of effort on the topic. Of course, I'm just the one asking questions and trying to raise ideas, so I'm perfectly fine with your current answer (useless expenditure of energy) even if somewhat disagreeing on the useless part of it :) As for the integrated pgbouncer daemon part, I'm thinking this would allow the infamous part 3 of the proposal (read-only slave) to get pretty simple to setup when ready: the slave knows who its master is, and as soon as an XID is needed the transaction queries are forwarded/proxied to it. Thanks again Florian ! It might be useful to document where to find solutions to that problem, but we can't take it on as part of core Postgres. Even the part when it makes sense (provided it does and I'm not completely off tracks here)? Regards, -- dim signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 30 May 2008 16:22:41 -0400 (EDT) Greg Smith wrote: On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote: Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. Not if you use pg_clearxlogtail ( http://www.2ndquadrant.com/replication.htm ), which got lost in the giant March commitfest queue but should probably wander into contrib as part of 8.4. Yes, this topic was discussed several times in the past but to solve this it needs a patch/solution which is integrated into PG itself, not contrib. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] move hash_any from hash to utils
Zdenek Kotala wrote: I'm trying to cleanup header dependences and it seems hash_any and hash_uint32 function is candidate to move into separate c file in backend/utils/misc directory. It significantly reduce amount of unnecessary includes from files like network.c ( http://doxygen.postgresql.org/network_8c.html) Any comments? I considered that too, so +1. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 2008-05-29 at 10:12 -0400, Tom Lane wrote: The Postgres core team met at PGCon to discuss a few issues, the largest of which is the need for simple, built-in replication for PostgreSQL. Historically the project policy has been to avoid putting replication into core PostgreSQL, so as to leave room for development of competing solutions, recognizing that there is no one size fits all replication solution. However, it is becoming clear that this policy is hindering acceptance of PostgreSQL to too great an extent, compared to the benefit it offers to the add-on replication projects. Users who might consider PostgreSQL are choosing other database systems because our existing replication options are too complex to install and use for simple cases. In practice, simple asynchronous single-master-multiple-slave replication covers a respectable fraction of use cases, so we have concluded that we should allow such a feature to be included in the core project. We emphasize that this is not meant to prevent continued development of add-on replication projects that cover more complex use cases. We believe that the most appropriate base technology for this is probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon. We hope that such a feature can be completed for 8.4. Ideally this would be coupled with the ability to execute read-only queries on the slave servers, but we see technical difficulties that might prevent that from being completed before 8.5 or even further out. (The big problem is that long-running slave-side queries might still need tuples that are vacuumable on the master, and so replication of vacuuming actions would cause the slave's queries to deliver wrong answers.) Again, this will not replace Slony, pgPool, Continuent, Londiste, or other systems for many users, as it will be not be highly scalable nor support long-distance replication nor replicating less than an entire installation. But it is time to include a simple, reliable basic replication feature in the core system. I'm in full support of this and commend the work of the NTT team. The goals and timescales are realistic and setting a timetable in this way will help planning for many users, I'm expecting to lead the charge on the Hot Standby project. The problem mentioned is just one of the issues, though overall I'm now optimistic about our eventual success in that area. I'm discussing this now with a couple of sponsors and would welcome serious financial commitments to this goal. Please contact me off-list if you agree also. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding second heap scan in VACUUM
On Thu, 2008-05-29 at 09:57 +0530, Pavan Deolasee wrote: On Thu, May 29, 2008 at 2:02 AM, Simon Riggs [EMAIL PROTECTED] wrote: I'm not happy that the VACUUM waits. It might wait a very long time and cause worse overall performance than the impact of the second scan. Lets not get too paranoid about the wait. It's a minor detail in the whole theory. I would suggest that the benefit of avoiding second scan would be huge. Remember, its just not a scan, it also dirties those blocks again, forcing them write to disk. Also, if you really have a situation where vacuum needs to wait for very long, then you are already in trouble. The long running transactions would prevent vacuuming many tuples. Been thinking some more about this. You're right that the second scan could re-dirty many pages and is probably something to avoid. The main issue I see is that you don't really know how much work will happen in the first phase and how much would happen in the second. Avoiding the second scan might be worth waiting for, it might not. You really have no idea how many tuples the LRT might effect. There is no typical here, it can vary from one extreme to another. Waiting could be very bad, or no problem at all. I'd rather keep it as it is than have sometimes better, sometimes worse behaviour. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Andrew, Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. The issue is that when you're talking about telecommunications companies (and similar) once a minute isn't adequate. Those folks want at least every second, or even better synchronous. Anyway, this is a pretty pointless discussion given that we want both capabilities, and stuff will get implemented in the order it makes technical sense. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding second heap scan in VACUUM
On Fri, 2008-05-30 at 14:22 +0530, Pavan Deolasee wrote: For large tables, two heap scans along with several additional page writes doesn't seem to the cost we can afford, especially in IO-bound application. IMHO a timed wait is not such a bad thing. Note that its all about VACUUM which is a background, maintenance activity and it won't harm to delay it by few seconds or even minutes. Also, as I said earlier waiting is a minor detail, may be there is a better way to do things. Unless there are some strong objections, I would like to give it a shot and see if there are any real benefits. We can then fix any regression cases. Let me know if somebody thinks there are certain show stoppers or the benefits of avoiding a second scan on a large table is not worth. I personally have a strong feeling that it's worth the efforts. I'm not really clear what you intend to do now. We both agreed that avoiding a second pass is a good thing. What I still don't accept is that an unconstrained wait is justifiable. You've just said its a minor detail, but that's not the way I see it. It might be a second, but it might be an hour or more. If I run a VACUUM at 0700, thinking it will finish by 0900 before my database gets busy, it is a very bad thing to find that it only started at 0900 and is now interfering with my business workload. A non-waiting solution seems like the only way to proceed. Is this a non-issue anyway, with DSM? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] intercepting WAL writes
On Wed, 2008-05-28 at 21:47 -0400, Mike wrote: On Wed, May 28, 2008 at 8:30 PM, Mike [EMAIL PROTECTED] wrote: When you say a bit of decoding, is that because the data written to the logs is after the query parser/planner? Or because it's written in several chunks? Or? Because that's the actual recovery record. There is no SQL text, just the WAL record type (XLOG_HEAP_INSERT, XLOG_HEAP_UPDATE, XLOG_XACT_COMMIT, ...) and the data as it relates to that operation. Oh- right- that makes sense. Hmm, yes and no. WAL doesn't carry enough information to reconstruct updates or deletes external to the database in question. That's the barrier you need to cross, not just piping the results somewhere. I envision being able to do this in the future and exposing an API to allow it to happen, but we aren't there yet. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Fwd: Re: [HACKERS] Core team statement on replication in PostgreSQL
[Looks like this mail missed the hackers list on reply to all, I wonder how it could happen... so I forward it] On Thu, 2008-05-29 at 17:00 +0100, Dave Page wrote: Yes, we're talking real-time streaming (synchronous) log shipping. Is there any design already how would this be implemented ? Some time ago I was interested in this subject and thought about having a normal postgres connection where the slave would issue a query to a special view which would simply stream WAL records as bytea from a requested point, without ever finishing. This would have the advantage (against a custom socket streaming solution) that it can reuse the complete infrastructure of connection making/managing/security (think SSL for no sniffing) of the postgres server. It would also be a public interface, which could be used by other possible tools too (think PITR management application/WAL stream repository). Another advantage would be that a PITR solution could be serving as a source for the WAL stream too, so the slave could either get the real time stream from the master, or rebuild a PITR state from a WAL repository server, using the same interface... Probably some kind of WAL subscription management should be also implemented, so that the slave can signal the master which WAL records it already applied and can be recycled on the master, and it would be nice if there could be multiple subscribers at the same time. Some subscriber time-out could be also implemented, while marking the subscription as timed out, so that the slave can know that it has to rebuild itself... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 30 May 2008 17:05:57 -0400 Andrew Dunstan wrote: Andreas 'ads' Scherbaum wrote: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. The real problem is not the 16 MB, the problem is: you can't compress this file. If the logfile is rotated it still contains all the old binary data which is not a good starter for compression. So you may have some kB changes in the wal logfile every minute but you still copy 16 MB data. Sure, it's not so much - but if you rotate a logfile every minute this still transfers 16*60*24 = ~23 GB a day. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Tatsuo Ishii wrote: Andreas 'ads' Scherbaum wrote: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. Sumitomo Electric Co., Ltd., a 20 billion dollars selling company in Japan (parent company of Sumitomo Electric Information Systems Co., Ltd., which is one of the Recursive SQL development support company) uses 100 PostgreSQL servers. They are doing backups by using log shipping to another data center and have problems with the amount of the transferring log data. They said this is one of the big problems they have with PostgreSQL and hope it will be solved in the near future. Excellent data point. Now, what I'd like to know is whether they are getting into trouble simply because of the volume of log data generated or because they have a short archive_timeout set. If it's the former (which seems more likely) then none of the ideas I have seen so far in this discussion seemed likely to help, and that would indeed be a major issue we should look at. Another question is this: are they being overwhelmed by the amount of network traffic generated, or by difficulty in postgres producers and/or consumers to keep up? If it's network traffic, then perhaps compression would help us. Maybe we need to set some goals for the level of log volumes we expect to be able to create/send/comsume. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Pgsql-hackers
a fur cap on his head. The sledge drove round the square twice, and Kay tied on
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sat, May 31, 2008 at 3:41 AM, Greg Smith [EMAIL PROTECTED] wrote: On Sat, 31 May 2008, Gurjeet Singh wrote: Not if you use pg_clearxlogtail This means we need to modify pg_standby to not check for filesize when reading XLogs. No, the idea is that you run the segments through pg_clearxlogtail | gzip, which then compresses lightly used segments massively because all the unused bytes are 0. File comes out the same size at the other side, but you didn't ship a full 16MB if there was only a few KB used. Got it. I remember reading about pg_clearxlogtail in these mailing lists; but somehow forgot how it actually worked! -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device
Re: [HACKERS] Overhauling GUCS
On Fri, 30 May 2008, Josh Berkus wrote: 1) Add several pieces of extra information to guc.c in the form of extra gettext commands: default value, subcategory, long description, recommendations, enum lists. 2) Incorporate this data into pg_settings When you last brought this up in February (I started on a long reply to http://archives.postgresql.org/pgsql-hackers/2008-02/msg00759.php that I never quite finished) the thing I got stuck on was how to deal with the way people tend to comment in these files as they change things. One problem I didn't really see addressed by the improvements you're suggesting is how to handle migrating customized settings to a new version (I'm talking about 8.4-9.0 after this is in place, 8.3-8.4 is a whole different problem). It would be nice to preserve history of what people did like in your examples (which look exactly like what I find myself running into in the field). Now, that will get a lot easier just by virtue of having a smaller config file, but I think that adding something into pg_settings that allows saving user-added commentary would be a nice step toward some useful standardization on that side of things. It would make future automated tools aimed at parsing and generating new files, as part of things like version upgrades, a lot easier if there was a standard way such comments were handled in addition to the raw data itself. The other thing I'd like to see make its way into pg_settings, so that tools can operate on it just by querying the database, is noting what file the setting came from so that you can track things like include file usage. I think with those two additions (comments and source file tracking) it would even be concievable to clone a working facsimile of even a complicated postgresql.conf file set remotely just by reading pg_settings. While a bit outside of the part you're specifically aiming to improve here, if you could slip these two additions in I think it would be a boon to future writers of multi-server management tools as well. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers